Jan Fernando created PHOENIX-1465:
-------------------------------------
Summary: Provide a configuration option to disable spooling query
results to disk
Key: PHOENIX-1465
URL: https://issues.apache.org/jira/browse/PHOENIX-1465
Project: Phoenix
Issue Type: Bug
Affects Versions: 4.2
Reporter: Jan Fernando
For compliance and disk space reasons there are use cases where we users need
to provide a strong guarantee that Phoenix will not spool data to disk across a
heterogeneous set of query patterns.
Currently all scans run through the SpoolingResultIterator and in the
constructor we do the following as part of delegating to the underlying
iterators that do the scan:
{code}
DeferredFileOutputStream spoolTo = new DeferredFileOutputStream(size, tempFile)
{
@Override
protected void thresholdReached() throws IOException {
super.thresholdReached();
chunk.close();
}
};
DataOutputStream out = new DataOutputStream(spoolTo);
final long maxBytesAllowed = maxSpoolToDisk == -1 ?
Long.MAX_VALUE : thresholdBytes + maxSpoolToDisk;
long bytesWritten = 0L;
int maxSize = 0;
for (Tuple result = scanner.next(); result != null; result =
scanner.next()) {
int length = TupleUtil.write(result, out);
bytesWritten += length;
if(bytesWritten > maxBytesAllowed){
throw new SpoolTooBigToDiskException("result
too big, max allowed(bytes): " + maxBytesAllowed);
}
maxSize = Math.max(length, maxSize);
}
{code}
We always go through the Spooling iterator and looking at the code it looks
like that even if we configure the spool size to 0 we only check after we have
written the data to the DataOutputStream which could result in a spool file
being written.
I think it would be much more straightforward if we:
a) Had a simple boolean configuration that would allow us to disable spooling
b) If this config disables spooling we bypass the spooling iterator and the
above logic
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)