[
https://issues.apache.org/jira/browse/DRILL-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17493341#comment-17493341
]
ASF GitHub Bot commented on DRILL-8134:
---------------------------------------
vvysotskyi commented on a change in pull request #2460:
URL: https://github.com/apache/drill/pull/2460#discussion_r808223918
##########
File path:
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/compression/DrillCompressionCodecFactory.java
##########
@@ -86,6 +99,9 @@ public BytesInputCompressor
getCompressor(CompressionCodecName codecName) {
codecName,
c -> new AirliftBytesInputCompressor(codecName, allocator)
);
+ } else if (codecName == CompressionCodecName.GZIP) {
+ // hack for gzip: construct a new codec factory every time to avoid a
concurrrency bug c.f. DRILL-8139
+ return CodecFactory.createDirectCodecFactory(config, allocator,
pageSize).getCompressor(codecName);
Review comment:
If I recall correctly, the codec factory should manage the full
lifecycle of compressors / decompressors, release created codecs when calling
its `CompressionCodecFactory.release()` method, and so on. But with this
change, it will be impossible to do that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Regression: cannot query Parquet INT96 columns as timestamps
> ------------------------------------------------------------
>
> Key: DRILL-8134
> URL: https://issues.apache.org/jira/browse/DRILL-8134
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.20.0
> Reporter: James Turton
> Assignee: James Turton
> Priority: Blocker
> Labels: Regression
> Fix For: 1.20.0
>
> Attachments: result.tar.gz
>
>
> Set store.parquet.reader.int96_as_timestamp = true and then query a file with
> an INT96 timestamp such as in the attachment. INT96 columns get downcast to
> 64 bit timestamps, a fact that is ignored by some buggy new write buffer
> index positioning code that was merged in the 1.20 dev cycle.
> [^result.tar.gz]
>
> {code:java}
> Caused by: java.lang.NullPointerException:
> at
> org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:234)
> at
> org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:234)
> at
> org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:298)
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:111)
> at
> org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
> at
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:85)
> at
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:170)
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103)
> at
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
> at
> org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.lambda$run$0(FragmentExecutor.java:321)
> at .......(:0)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
> at
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
> at
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)