[
https://issues.apache.org/jira/browse/IMPALA-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760811#comment-16760811
]
Tim Armstrong commented on IMPALA-8109:
---------------------------------------
{noformat}
tpch10> select distinct * from lineitem_gz order by l_partkey limit 5;
Query: select distinct * from lineitem_gz order by l_partkey limit 5
Query submitted at: 2019-02-05 05:34:04 (Coordinator:
http://tarmstrong-box:25000)
Query progress can be monitored at:
http://tarmstrong-box:25000/query_plan?query_id=8147fabfd162bebc:42e1b28e00000000
42516801 1 2 2 38.00 34238.00 0.03 0.08
R F 1995-05-04 1995-04-24 1995-06-02 NONE FOB
foxes wake quickly plat
5120486 1 2 1 42.00 37842.00 0.02 0.01 A
F 1992-06-06 1992-03-26 1992-06-12 DELIVER IN PERSON
SHIP blithely
9676064 1 25002 2 45.00 40545.00 0.09 0.01 N
O 1997-10-06 1997-12-30 1997-10-13 NONE TRUCK ithely
idle foxes nod alongside of the
52024262 1 50002 5 43.00 38743.00 0.07 0.00
R F 1994-12-11 1994-10-23 1995-01-01 NONE RAIL
use. quietl
23742531 1 50002 1 42.00 37842.00 0.03 0.08
A F 1993-04-12 1993-06-01 1993-05-08 TAKE BACK
RETURN RAIL foxes. fluffily ironic theodolites affi
WARNINGS: For better performance, snappy-, gzip-, and bzip-compressed files
should not be split into multiple HDFS blocks.
file=hdfs://localhost:20500/test-warehouse/tpch_gzip10.lineitem/lineitem.tbl.gz
offset 402653184 (1 of 21 similar)
Fetched 5 row(s) in 406.65s
{noformat}
I'm actually trying to reproduce with IMPALA-7543 reverted but it still works.
Maybe if you show your hdfs fsck output for the file that will provide some
clues?
{noformat}
$ hdfs fsck
hdfs://localhost:20500/test-warehouse/tpch_gzip10.lineitem/lineitem.tbl.gz
Connecting to namenode via
http://localhost:5070/fsck?ugi=tarmstrong&path=%2Ftest-warehouse%2Ftpch_gzip10.lineitem%2Flineitem.tbl.gz
FSCK started by tarmstrong (auth:SIMPLE) from /127.0.0.1 for path
/test-warehouse/tpch_gzip10.lineitem/lineitem.tbl.gz at Tue Feb 05 05:53:31 PST
2019
Status: HEALTHY
Number of data-nodes: 3
Number of racks: 1
Total dirs: 0
Total symlinks: 0
Replicated Blocks:
Total size: 2859414565 B
Total files: 1
Total blocks (validated): 1 (avg. block size 2859414565 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 3.0
Missing blocks: 0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Blocks queued for replication: 0
Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
Blocks queued for replication: 0
FSCK ended at Tue Feb 05 05:53:31 PST 2019 in 0 milliseconds
The filesystem under path
'/test-warehouse/tpch_gzip10.lineitem/lineitem.tbl.gz' is HEALTHY
{noformat}
> Impala cannot read the gzip files bigger than 2 GB
> --------------------------------------------------
>
> Key: IMPALA-8109
> URL: https://issues.apache.org/jira/browse/IMPALA-8109
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.12.0
> Reporter: hakki
> Assignee: Tim Armstrong
> Priority: Major
>
> When querying a partition containing gzip files, the query fails with the
> error below:
> WARNINGS: Disk I/O error: Error seeking to -2147483648 in file:
> hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXXXXXX.gz:
> Error(255): Unknown error 255
> Root cause: EOFException: Cannot seek to negative offset
> hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXXXXXX.gz file is
> a delimited text file and has a size of bigger than 2 GB (approx: 2.4 GB) The
> uncompressed size is ~13GB
> The impalad version is : 2.12.0-cdh5.15.0
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]