jnturton opened a new pull request #2463: URL: https://github.com/apache/drill/pull/2463
# [DRILL-8139](https://issues.apache.org/jira/browse/DRILL-8139): Data corruption and occasional segfaults querying Parquet/gzip under the async column reader and sync page reader ## Description The gzip codec objects returned by the Parquet lib's codec factory are not thread safe. Here we work around the problem by creating, and later releasing, single-use codec factories for gzip. Many codec factories can be created during the reading of a Parquet file containing gzip compressed column data which is unnatural and unfortunate but [the added overhead does appear to be small](https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.2/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/DirectCodecFactory.java). Note: currently this PR is rebased onto #2460 since that is required for a clean test run. ## Documentation N/A ## Testing TestParquetWriter#testTPCHReadWriteDictGzip Manual testing, especially under the async column reader. A unit test that uses the async column reader is currently not possible because of DRILL-8138. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org