[jira] [Commented] (DRILL-6874) CTAS from json to parquet is not working on S3 storage

ASF GitHub Bot (JIRA) Sat, 08 Dec 2018 23:32:40 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16713890#comment-16713890
 ]


ASF GitHub Bot commented on DRILL-6874:
---------------------------------------

amansinha100 closed pull request #1565: DRILL-6874: Close input stream after 
AsyncPageReaderTask is completed
URL: https://github.com/apache/drill/pull/1565
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java
 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java
index e429fb63a7c..8b5c926186b 100644
--- 
a/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java
+++ 
b/exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/AsyncPageReader.java
@@ -417,6 +417,13 @@ public Void call() throws IOException {
       if (totalValuesRead >= totalValuesCount) {
         try {
           queue.put(ReadStatus.EMPTY);
+          // Some InputStreams (like S3ObjectInputStream) should be closed
+          // as soon as possible to make the connection reusable.
+          try {
+            parent.inputStream.close();
+          } catch (IOException e) {
+            logger.trace(String.format("[%s]: Failure while closing 
InputStream", name), e);
+          }
         } catch (InterruptedException e) {
           Thread.currentThread().interrupt();
           // Do nothing.


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> CTAS from json to parquet is not working on S3 storage
> ------------------------------------------------------
>
>                 Key: DRILL-6874
>                 URL: https://issues.apache.org/jira/browse/DRILL-6874
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.14.0
>            Reporter: Denys Ordynskiy
>            Assignee: Bohdan Kazydub
>            Priority: Major
>              Labels: ready-to-commit
>             Fix For: 1.15.0
>
>         Attachments: ctasjsontoparquet.zip, drillbit.log, 
> drillbit_queries.json, s3src.json, sqlline.log
>
>
> Json file "s3src.json" was uploaded to the s3 storage.
> Query from Json works fine:
> select * from s3.tmp.`s3src.json`;
> | id  |  first_name  |  last_name  |
> | 1   | first_name1  | last_name1  |
> | 2   | first_name2  | last_name2  |
> | 3   | first_name3  | last_name3  |
> | 4   | first_name4  | last_name4  |
> | 5   | first_name5  | last_name5  |
> 5 rows selected (2.803 seconds)
> CTAS from this json file returns successfully result:
> create table s3.tmp.`ctasjsontoparquet` as select * from s3.tmp.`s3src.json`;
> | Fragment  | Number of records written  |
> | 0_0       | 5                          |
> 1 row selected (9.264 seconds)
> *Query from the created parquet table {color:#d04437}throws an error:{color}*
> select * from s3.tmp.`ctasjsontoparquet`;
> {code:java}
> Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
>   optional int64 id;
>   optional binary first_name (UTF8);
>   optional binary last_name (UTF8);
> }
> , metadata: {drill-writer.version=2, drill.version=1.15.0-SNAPSHOT}}, blocks: 
> [BlockMetaData{5, 360 [ColumnMetaData{UNCOMPRESSED [id] optional int64 id  
> [BIT_PACKED, RLE, PLAIN], 4}, ColumnMetaData{UNCOMPRESSED [first_name] 
> optional binary first_name (UTF8)  [BIT_PACKED, RLE, PLAIN], 111}, 
> ColumnMetaData{UNCOMPRESSED [last_name] optional binary last_name (UTF8)  
> [BIT_PACKED, RLE, PLAIN], 241}]}]}
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: 885723e4-8385-4fb0-87dd-c08b0570db95 on maprhost:31010] 
> (state=,code=0)
> {code}
> The same CTAS query works fine on MapRFS and FileSystem storages.
> Log files, json file and created parquet file from S3 are in the attachments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6874) CTAS from json to parquet is not working on S3 storage

Reply via email to