sachouche opened a new pull request #1693: DRILL-7100: Fixed 
IllegalArgumentException when reading Parquet data
URL: https://github.com/apache/drill/pull/1693
 
 
   **ANALYSIS**
   * Code inspection suggest a rare scenario where such an issue could happen; 
example
   * Row Group has a variable length column with the following sizes:
   v1 (1k)
   v2 (1k)
   v3 (1MB)
   v4 (1MB)
   v5 (1MB)
   .
   .
   v16 (1MB)
   v17 (1k)
   .. the rest of the values will be 1k
   
   * This scenario can a) make the overall number of rows quite large (few 
thousands) and b) during a batch, the column precision might be inflated
   * The batch sizing logic uses integers to assess the memory requirement; 
thus it is possible for the product NUM_BATCH_ROWS * COLUMN_PRECISION to 
overflow
   * We use integer as most of the Drill framework uses integers to compute 
memory limits
   * The code handles this use-case but there is one small window where this 
logic could break (when we are aggregating all column memory usage)
   
   **FIX**
   * Updated the Parquet batch sizing logic to use long when computing memory 
related checks
   * This way a large precision multiplied with the current batch size will not 
overflow

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to