[ 
https://issues.apache.org/jira/browse/DRILL-7736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723094#comment-17723094
 ] 

James Turton commented on DRILL-7736:
-------------------------------------

Hi, are you able to check if this is still an issue in Drill 1.21.1?

> Error while reading from Parquet : DATA_READ ERROR: Exception occurred while 
> reading from disk
> ----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-7736
>                 URL: https://issues.apache.org/jira/browse/DRILL-7736
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Functions - Drill
>    Affects Versions: 1.17.0
>            Reporter: Sreeparna Bhabani
>            Priority: Blocker
>
> Facing one issue while creating Parquet file in Drill from another Parquet 
> file.
> *Summary-*
> I am re-writing one Parquet file from another Parquet file using CTAS 
> PARTITION BY (). The source Parquet file is generated from Python. But when I 
> am trying to rewrite the parquet I am getting error. The details of the error 
> is given below.
> *Version of Apache Drill* -
> 1.17
> *Memory config-*
> DRILL_HEAP=16 G
>  DRILL_MAX_DIRECT_MEMORY=32G
> *Few configs are mentioned here for information-*
> exec.sort.disable_managed=true
> store.parquet.reader.pagereader.async=true;
> store.parquet.reader.pagereader.bufferedread=false;
> planner.memory.max_query_memory_per_node=31147483648
> drill.exec.memory.operator.output_batch_size=4194304
> *Details of volume-*
> The number of rows for which I am trying to CTAS is - 25245241. No of columns 
> 145.
> FYI - I am able to create Parquet using CTAS for less number of rows.
> *CTAS script-*
> CREATE TABLE dfs.root.<Table_name>
>  PARTITION BY (<Column1>,<Column2>,<Column3>)
>  AS SELECT * 
>  FROM dfs.root.<source_parquet>;
> *Error Log-*
> 2020-05-07 xx:xx:xx,504 [scan-4] INFO  o.a.d.e.s.p.c.AsyncPageReader - User 
> Error Occurred: Exception occurred while reading from disk. (can not read 
> class org.apache.parquet.format.PageHeader: java.io.InterruptedIOException: 
> Interrupted while choosing DataNode for read.)
>  org.apache.drill.common.exceptions.UserException: DATA_READ ERROR: Exception 
> occurred while reading from disk.
> File:  <xxx>.parquet
>  Column:  <xxx>
>  Row Group Start:  25545832
> [Error Id: 4157803d-a37e-4693-bc1a-b654807222ed ]
>   at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:637)
>   at 
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.handleAndThrowException(AsyncPageReader.java:190)
>   at 
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.access$700(AsyncPageReader.java:84)
>   at 
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:480)
>   at 
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:394)
>   at 
> org.apache.drill.exec.util.concurrent.ExecutorServiceUtil$CallableTaskWrapper.call(ExecutorServiceUtil.java:85)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>  Caused by: java.io.IOException: can not read class 
> org.apache.parquet.format.PageHeader: java.io.InterruptedIOException: 
> Interrupted while choosing DataNode for read.
>   at org.apache.parquet.format.Util.read(Util.java:232)
>   at org.apache.parquet.format.Util.readPageHeader(Util.java:81)
>   at 
> org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader$AsyncPageReaderTask.call(AsyncPageReader.java:437)
>   ... 6 common frames omitted
>  Caused by: shaded.parquet.org.apache.thrift.transport.TTransportException: 
> java.io.InterruptedIOException: Interrupted while choosing DataNode for read.
>   at 
> shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>   at 
> shaded.parquet.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>   at 
> shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:634)
>   at 
> shaded.parquet.org.apache.thrift.protocol.TCompactProtocol.readFieldBegin(TCompactProtocol.java:539)
>   at 
> org.apache.parquet.format.InterningProtocol.readFieldBegin(InterningProtocol.java:158)
>   at 
> org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:973)
>   at 
> org.apache.parquet.format.PageHeader$PageHeaderStandardScheme.read(PageHeader.java:966)
>   at org.apache.parquet.format.PageHeader.read(PageHeader.java:843)
>   at org.apache.parquet.format.Util.read(Util.java:229)
>   ... 8 common frames omitted
>  Caused by: java.io.InterruptedIOException: Interrupted while choosing 
> DataNode for read.
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:910)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:862)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:841)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:567)
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:757)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:829)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>   at 
> shaded.parquet.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>   ... 16 common frames omitted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to