[
https://issues.apache.org/jira/browse/HIVE-9863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565190#comment-14565190
]
Sergio Peña commented on HIVE-9863:
-----------------------------------
[~xuefuz]
I run the same tests using the hive cli + spark this time; but it works fine.
There is no error exception.
{noformat}
hive> desc formatted parquet;
...
# Storage Information
SerDe Library:
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
...
hive> select count(*) from parquet;
Query ID = sergio_20150529133253_5fd9da28-d73b-4137-a04a-3975108dbba7
...
Starting Spark Job = 513800e8-2d6a-47af-830c-d18099e52bc3
2015-05-29 13:32:54,261 Stage-3_0: 1/1 Finished Stage-4_0: 1/1 Finished
Status: Finished successfully in 1.01 seconds
OK
500
Time taken: 1.198 seconds, Fetched: 1 row(s)
{noformat}
> Querying parquet tables fails with IllegalStateException [Spark Branch]
> -----------------------------------------------------------------------
>
> Key: HIVE-9863
> URL: https://issues.apache.org/jira/browse/HIVE-9863
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Xuefu Zhang
>
> Not necessarily happens only in spark branch, queries such as select count(*)
> from table_name fails with error:
> {code}
> hive> select * from content limit 2;
> OK
> Failed with exception java.io.IOException:java.lang.IllegalStateException:
> All the offsets listed in the split should be found in the file. expected:
> [4, 4] found: [BlockMetaData{69644, 881917418 [ColumnMetaData{GZIP [guid]
> BINARY [PLAIN, BIT_PACKED], 4}, ColumnMetaData{GZIP [collection_name] BINARY
> [PLAIN_DICTIONARY, BIT_PACKED], 389571}, ColumnMetaData{GZIP [doc_type]
> BINARY [PLAIN_DICTIONARY, BIT_PACKED], 389790}, ColumnMetaData{GZIP [stage]
> INT64 [PLAIN_DICTIONARY, BIT_PACKED], 389887}, ColumnMetaData{GZIP
> [meta_timestamp] INT64 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 397673},
> ColumnMetaData{GZIP [doc_timestamp] INT64 [RLE, PLAIN_DICTIONARY,
> BIT_PACKED], 422161}, ColumnMetaData{GZIP [meta_size] INT32 [RLE,
> PLAIN_DICTIONARY, BIT_PACKED], 460215}, ColumnMetaData{GZIP [content_size]
> INT32 [RLE, PLAIN_DICTIONARY, BIT_PACKED], 521728}, ColumnMetaData{GZIP
> [source] BINARY [RLE, PLAIN, BIT_PACKED], 683740}, ColumnMetaData{GZIP
> [delete_flag] BOOLEAN [RLE, PLAIN, BIT_PACKED], 683787}, ColumnMetaData{GZIP
> [meta] BINARY [RLE, PLAIN, BIT_PACKED], 683834}, ColumnMetaData{GZIP
> [content] BINARY [RLE, PLAIN, BIT_PACKED], 6992365}]}] out of: [4,
> 129785482, 260224757] in range 0, 134217728
> Time taken: 0.253 seconds
> hive>
> {code}
> I can reproduce the problem with either local or yarn-cluster. It seems
> happening to MR also. Thus, I suspect this is an parquet problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)