[ https://issues.apache.org/jira/browse/DRILL-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723094#comment-17723094 ]
James Turton commented on DRILL-7736: ------------------------------------- Hi, are you able to check if this is still an issue in Drill 1.21.1? > Error while reading from Parquet : DATA_READ ERROR: Exception occurred while > reading from disk > ---------------------------------------------------------------------------------------------- > > Key: DRILL-7736 > URL: https://issues.apache.org/jira/browse/DRILL-7736 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill > Affects Versions: 1.17.0 > Reporter: Sreeparna Bhabani > Priority: Blocker > > Facing one issue while creating Parquet file in Drill from another Parquet > file. > *Summary-* > I am re-writing one Parquet file from another Parquet file using CTAS > PARTITION BY (). The source Parquet file is generated from Python. But when I > am trying to rewrite the parquet I am getting error. The details of the error > is given below. > *Version of Apache Drill* - > 1.17 > *Memory config-* > DRILL_HEAP=16 G > DRILL_MAX_DIRECT_MEMORY=32G > *Few configs are mentioned here for information-* > exec.sort.disable_managed=true > store.parquet.reader.pagereader.async=true; > store.parquet.reader.pagereader.bufferedread=false; > planner.memory.max_query_memory_per_node=31147483648 > drill.exec.memory.operator.output_batch_size=4194304 > *Details of volume-* > The number of rows for which I am trying to CTAS is - 25245241. No of columns > 145. > FYI - I am able to create Parquet using CTAS for less number of rows. > *CTAS script-* > CREATE TABLE dfs.root.<Table_name> > PARTITION BY (<Column1>,<Column2>,<Column3>) > AS SELECT * > FROM dfs.root.<source_parquet>; > *Error Log-* > 2020-05-07 xx:xx:xx,504 [scan-4] INFO o.a.d.e.s.p.c.AsyncPageReader - User > Error Occurred: Exception occurred while reading from disk. (can not read > class org.apache.parquet.format.PageHeader: java.io.InterruptedIOException: > Interrupted while choosing DataNode for read.) > org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Exception > occurred while reading from disk. > File: <xxx>.parquet > Column: <xxx> > Row Group Start: 25545832 > [Error Id: 4157803d-a37e-4693-bc1a-b654807222ed ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:637) > at > org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsyncPageReader.java:190) > at > org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.access$700(AsyncPageReader.java:84) > at > org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:480) > at > org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:394) > at > org.apache.drill.exec.util.concurrent.ExecutorServiceUtil$CallableTaskWrapper.call(ExecutorServiceUtil.java:85) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: can not read class > org.apache.parquet.format.PageHeader: java.io.InterruptedIOException: > Interrupted while choosing DataNode for read. > at org.apache.parquet.format.Util.read(Util.java:232) > at org.apache.parquet.format.Util.readPageHeader(Util.java:81) > at > org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:437) > ... 6 common frames omitted > Caused by: shaded.parquet.org.apache.thrift.transport.TTransportException: > java.io.InterruptedIOException: Interrupted while choosing DataNode for read. > at > shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) > at > shaded.parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) > at > shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634) > at > shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:539) > at > org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:158) > at > org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:973) > at > org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:966) > at org.apache.parquet.format.PageHeader.read(PageHeader.java:843) > at org.apache.parquet.format.Util.read(Util.java:229) > ... 8 common frames omitted > Caused by: java.io.InterruptedIOException: Interrupted while choosing > DataNode for read. > at > org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:910) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:862) > at > org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:841) > at > org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:567) > at > org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829) > at java.io.DataInputStream.read(DataInputStream.java:149) > at java.io.FilterInputStream.read(FilterInputStream.java:133) > at > shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > ... 16 common frames omitted -- This message was sent by Atlassian Jira (v8.20.10#820010)