[jira] [Commented] (DRILL-6874) CTAS from json to parquet is not working on S3 storage

ASF GitHub Bot (JIRA) Fri, 07 Dec 2018 00:26:35 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-6874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16712482#comment-16712482
 ]


ASF GitHub Bot commented on DRILL-6874:
---------------------------------------

KazydubB opened a new pull request #1565: DRILL-6874: CTAS from json to parquet 
is not working on S3 storage
URL: https://github.com/apache/drill/pull/1565
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> CTAS from json to parquet is not working on S3 storage
> ------------------------------------------------------
>
>                 Key: DRILL-6874
>                 URL: https://issues.apache.org/jira/browse/DRILL-6874
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.15.0
>            Reporter: Denys Ordynskiy
>            Assignee: Bohdan Kazydub
>            Priority: Major
>             Fix For: 1.16.0
>
>         Attachments: ctasjsontoparquet.zip, drillbit.log, 
> drillbit_queries.json, s3src.json, sqlline.log
>
>
> Json file "s3src.json" was uploaded to the s3 storage.
> Query from Json works fine:
> select * from s3.tmp.`s3src.json`;
> | id  |  first_name  |  last_name  |
> | 1   | first_name1  | last_name1  |
> | 2   | first_name2  | last_name2  |
> | 3   | first_name3  | last_name3  |
> | 4   | first_name4  | last_name4  |
> | 5   | first_name5  | last_name5  |
> 5 rows selected (2.803 seconds)
> CTAS from this json file returns successfully result:
> create table s3.tmp.`ctasjsontoparquet` as select * from s3.tmp.`s3src.json`;
> | Fragment  | Number of records written  |
> | 0_0       | 5                          |
> 1 row selected (9.264 seconds)
> *Query from the created parquet table {color:#d04437}throws an error:{color}*
> select * from s3.tmp.`ctasjsontoparquet`;
> {code:java}
> Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message root {
>   optional int64 id;
>   optional binary first_name (UTF8);
>   optional binary last_name (UTF8);
> }
> , metadata: {drill-writer.version=2, drill.version=1.15.0-SNAPSHOT}}, blocks: 
> [BlockMetaData{5, 360 [ColumnMetaData{UNCOMPRESSED [id] optional int64 id  
> [BIT_PACKED, RLE, PLAIN], 4}, ColumnMetaData{UNCOMPRESSED [first_name] 
> optional binary first_name (UTF8)  [BIT_PACKED, RLE, PLAIN], 111}, 
> ColumnMetaData{UNCOMPRESSED [last_name] optional binary last_name (UTF8)  
> [BIT_PACKED, RLE, PLAIN], 241}]}]}
> Fragment 0:0
> Please, refer to logs for more information.
> [Error Id: 885723e4-8385-4fb0-87dd-c08b0570db95 on maprhost:31010] 
> (state=,code=0)
> {code}
> The same CTAS query works fine on MapRFS and FileSystem storages.
> Log files, json file and created parquet file from S3 are in the attachments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6874) CTAS from json to parquet is not working on S3 storage

Reply via email to