[jira] [Commented] (DRILL-8134) Regression: cannot query Parquet INT96 columns as timestamps

ASF GitHub Bot (Jira) Wed, 16 Feb 2022 08:49:07 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493341#comment-17493341
 ]


ASF GitHub Bot commented on DRILL-8134:
---------------------------------------

vvysotskyi commented on a change in pull request #2460:
URL: https://github.com/apache/drill/pull/2460#discussion_r808223918



##########
File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -86,6 +99,9 @@ public BytesInputCompressor 
getCompressor(CompressionCodecName codecName) {
           codecName,
           c -> new AirliftBytesInputCompressor(codecName, allocator)
       );
+    } else if (codecName == CompressionCodecName.GZIP) {
+      // hack for gzip: construct a new codec factory every time to avoid a 
concurrrency bug c.f. DRILL-8139
+      return CodecFactory.createDirectCodecFactory(config, allocator, 
pageSize).getCompressor(codecName);

Review comment:
       If I recall correctly, the codec factory should manage the full 
lifecycle of compressors / decompressors, release created codecs when calling 
its `CompressionCodecFactory.release()` method, and so on. But with this 
change, it will be impossible to do that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Regression: cannot query Parquet INT96 columns as timestamps
> ------------------------------------------------------------
>
>                 Key: DRILL-8134
>                 URL: https://issues.apache.org/jira/browse/DRILL-8134
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.20.0
>            Reporter: James Turton
>            Assignee: James Turton
>            Priority: Blocker
>              Labels: Regression
>             Fix For: 1.20.0
>
>         Attachments: result.tar.gz
>
>
> Set store.parquet.reader.int96_as_timestamp = true and then query a file with 
> an INT96 timestamp such as in the attachment.  INT96 columns get downcast to 
> 64 bit timestamps, a fact that is ignored by some buggy new write buffer 
> index positioning code that was merged in the 1.20 dev cycle.
> [^result.tar.gz]
>  
> {code:java}
> Caused by: java.lang.NullPointerException:
>         at 
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:234)
>         at 
> org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:234)
>         at 
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:298)
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:111)
>         at 
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
>         at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:85)
>         at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:170)
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103)
>         at 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
>         at 
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.lambda$run$0(FragmentExecutor.java:321)
>         at .......(:0)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (DRILL-8134) Regression: cannot query Parquet INT96 columns as timestamps

Reply via email to