[jira] [Commented] (DRILL-6874) CTAS from json to parquet is not working on S3 storage

Bohdan Kazydub (JIRA) Fri, 07 Dec 2018 00:42:46 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712497#comment-16712497
 ]


Bohdan Kazydub commented on DRILL-6874:
---------------------------------------

The issue is due to S3 connections not being released right after finishing 
processing a column, thus a connection pool does not have a free connection to 
start processing another column asynchronously. To set number of maximum 
connections use `fs.s3a.connection.maximum` configuration parameter (which is 
of integer type and is equal to 15 by default).

> CTAS from json to parquet is not working on S3 storage
> ------------------------------------------------------
>
>                 Key: DRILL-6874
>                 URL: https://issues.apache.org/jira/browse/DRILL-6874
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.15.0
>            Reporter: Denys Ordynskiy
>            Assignee: Bohdan Kazydub
>            Priority: Major
>             Fix For: 1.16.0
>
>         Attachments: ctasjsontoparquet.zip, drillbit.log, 
> drillbit_queries.json, s3src.json, sqlline.log
>
>
> Json file "s3src.json" was uploaded to the s3 storage.
> Query from Json works fine:
> select * from s3.tmp.`s3src.json`;
> | id  |  first_name  |  last_name  |
> | 1   | first_name1  | last_name1  |
> | 2   | first_name2  | last_name2  |
> | 3   | first_name3  | last_name3  |
> | 4   | first_name4  | last_name4  |
> | 5   | first_name5  | last_name5  |
> 5 rows selected (2.803 seconds)
> CTAS from this json file returns successfully result:
> create table s3.tmp.`ctasjsontoparquet` as select * from s3.tmp.`s3src.json`;
> | Fragment  | Number of records written  |
> | 0_0       | 5                          |
> 1 row selected (9.264 seconds)
> *Query from the created parquet table {color:#d04437}throws an error:{color}*
> select * from s3.tmp.`ctasjsontoparquet`;
> {code:java}
> Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
>   optional int64 id;
>   optional binary first_name (UTF8);
>   optional binary last_name (UTF8);
> }
> , metadata: {drill-writer.version=2, drill.version=1.15.0-SNAPSHOT}}, blocks: 
> [BlockMetaData{5, 360 [ColumnMetaData{UNCOMPRESSED [id] optional int64 id  
> [BIT_PACKED, RLE, PLAIN], 4}, ColumnMetaData{UNCOMPRESSED [first_name] 
> optional binary first_name (UTF8)  [BIT_PACKED, RLE, PLAIN], 111}, 
> ColumnMetaData{UNCOMPRESSED [last_name] optional binary last_name (UTF8)  
> [BIT_PACKED, RLE, PLAIN], 241}]}]}
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: 885723e4-8385-4fb0-87dd-c08b0570db95 on maprhost:31010] 
> (state=,code=0)
> {code}
> The same CTAS query works fine on MapRFS and FileSystem storages.
> Log files, json file and created parquet file from S3 are in the attachments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6874) CTAS from json to parquet is not working on S3 storage

Reply via email to