[jira] [Created] (PHOENIX-1465) Provide a configuration option to disable spooling query results to disk

Jan Fernando (JIRA) Tue, 18 Nov 2014 10:15:23 -0800

Jan Fernando created PHOENIX-1465:
-------------------------------------

             Summary: Provide a configuration option to disable spooling query 
results to disk
                 Key: PHOENIX-1465
                 URL: https://issues.apache.org/jira/browse/PHOENIX-1465
             Project: Phoenix
          Issue Type: Bug
    Affects Versions: 4.2
            Reporter: Jan Fernando



For compliance and disk space reasons there are use cases where we users need 
to provide a strong guarantee that Phoenix will not spool data to disk across a 
heterogeneous set of query patterns. 

Currently all scans run through the SpoolingResultIterator and in the 
constructor we do the following as part of delegating to the underlying 
iterators that do the scan:

{code}
DeferredFileOutputStream spoolTo = new DeferredFileOutputStream(size, tempFile) 
{
                @Override
                protected void thresholdReached() throws IOException {
                    super.thresholdReached();
                    chunk.close();
                }
            };
            DataOutputStream out = new DataOutputStream(spoolTo);
            final long maxBytesAllowed = maxSpoolToDisk == -1 ? 
                        Long.MAX_VALUE : thresholdBytes + maxSpoolToDisk;
            long bytesWritten = 0L;
            int maxSize = 0;
            for (Tuple result = scanner.next(); result != null; result = 
scanner.next()) {
                int length = TupleUtil.write(result, out);
                bytesWritten += length;
                if(bytesWritten > maxBytesAllowed){
                                throw new SpoolTooBigToDiskException("result 
too big, max allowed(bytes): " + maxBytesAllowed);
                }
                maxSize = Math.max(length, maxSize);
            }
{code}

We always go through the Spooling iterator and looking at the code it looks 
like that even if we configure the spool size to 0 we only check after we have 
written the data to the DataOutputStream which could result in a spool file 
being written.

I think it would be much more straightforward if we:
a) Had a simple boolean configuration that would allow us to disable spooling
b) If this config disables spooling we bypass the spooling iterator and the 
above logic



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (PHOENIX-1465) Provide a configuration option to disable spooling query results to disk

Reply via email to