[
https://issues.apache.org/jira/browse/DRILL-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723094#comment-17723094
]
James Turton commented on DRILL-7736:
-------------------------------------
Hi, are you able to check if this is still an issue in Drill 1.21.1?
> Error while reading from Parquet : DATA_READ ERROR: Exception occurred while
> reading from disk
> ----------------------------------------------------------------------------------------------
>
> Key: DRILL-7736
> URL: https://issues.apache.org/jira/browse/DRILL-7736
> Project: Apache Drill
> Issue Type: Improvement
> Components: Functions - Drill
> Affects Versions: 1.17.0
> Reporter: Sreeparna Bhabani
> Priority: Blocker
>
> Facing one issue while creating Parquet file in Drill from another Parquet
> file.
> *Summary-*
> I am re-writing one Parquet file from another Parquet file using CTAS
> PARTITION BY (). The source Parquet file is generated from Python. But when I
> am trying to rewrite the parquet I am getting error. The details of the error
> is given below.
> *Version of Apache Drill* -
> 1.17
> *Memory config-*
> DRILL_HEAP=16 G
> DRILL_MAX_DIRECT_MEMORY=32G
> *Few configs are mentioned here for information-*
> exec.sort.disable_managed=true
> store.parquet.reader.pagereader.async=true;
> store.parquet.reader.pagereader.bufferedread=false;
> planner.memory.max_query_memory_per_node=31147483648
> drill.exec.memory.operator.output_batch_size=4194304
> *Details of volume-*
> The number of rows for which I am trying to CTAS is - 25245241. No of columns
> 145.
> FYI - I am able to create Parquet using CTAS for less number of rows.
> *CTAS script-*
> CREATE TABLE dfs.root.<Table_name>
> PARTITION BY (<Column1>,<Column2>,<Column3>)
> AS SELECT *
> FROM dfs.root.<source_parquet>;
> *Error Log-*
> 2020-05-07 xx:xx:xx,504 [scan-4] INFO o.a.d.e.s.p.c.AsyncPageReader - User
> Error Occurred: Exception occurred while reading from disk. (can not read
> class org.apache.parquet.format.PageHeader: java.io.InterruptedIOException:
> Interrupted while choosing DataNode for read.)
> org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Exception
> occurred while reading from disk.
> File: <xxx>.parquet
> Column: <xxx>
> Row Group Start: 25545832
> [Error Id: 4157803d-a37e-4693-bc1a-b654807222ed ]
> at
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:637)
> at
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsyncPageReader.java:190)
> at
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.access$700(AsyncPageReader.java:84)
> at
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:480)
> at
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:394)
> at
> org.apache.drill.exec.util.concurrent.ExecutorServiceUtil$CallableTaskWrapper.call(ExecutorServiceUtil.java:85)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: can not read class
> org.apache.parquet.format.PageHeader: java.io.InterruptedIOException:
> Interrupted while choosing DataNode for read.
> at org.apache.parquet.format.Util.read(Util.java:232)
> at org.apache.parquet.format.Util.readPageHeader(Util.java:81)
> at
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:437)
> ... 6 common frames omitted
> Caused by: shaded.parquet.org.apache.thrift.transport.TTransportException:
> java.io.InterruptedIOException: Interrupted while choosing DataNode for read.
> at
> shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
> at
> shaded.parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
> at
> shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
> at
> shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:539)
> at
> org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:158)
> at
> org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:973)
> at
> org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:966)
> at org.apache.parquet.format.PageHeader.read(PageHeader.java:843)
> at org.apache.parquet.format.Util.read(Util.java:229)
> ... 8 common frames omitted
> Caused by: java.io.InterruptedIOException: Interrupted while choosing
> DataNode for read.
> at
> org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:910)
> at
> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:862)
> at
> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:841)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:567)
> at
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757)
> at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at
> shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> ... 16 common frames omitted
--
This message was sent by Atlassian Jira
(v8.20.10#820010)