[
https://issues.apache.org/jira/browse/IMPALA-10042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170393#comment-17170393
]
kishorekumar kuruguntla edited comment on IMPALA-10042 at 8/7/20, 5:53 PM:
---------------------------------------------------------------------------
Thanks [~tarmstrong] for your reply . I updated my responses in blue font.
does "invalidate metadata <dbname.table_name>" help?
{color:#0747a6}I tried multiple times running Invalidate metadata and Refresh
<table_name> , it doesn't help.( Still getting Stale metadata error){color}
Would also be helpful to understand how that file was ingested, if files are
overwritten in-place there are more issues related to caching that are possible
(we've fixed all the known ones in later versions though). We had some related
issues due to IMPALA-8561
{color:#0747a6}Files are ingested from Hive Beeline.{color}
{color:#0747a6}As a part of our ETL ,we create a staging table and copy data
from stage table to final table using LOAD INPATH command (Overwritten
in-place),all these commands are executed in Hive Beeline.{color}
{color:#0747a6}We are using hive for our ETL process and impala for Dashboard
queries.{color}
{color:#0747a6}Once our ETL process is finished,we run below commands {color}
{color:#0747a6}1.Invalidate metadata {color}
{color:#0747a6}2.Refresh {color}
{color:#0747a6}3.Compute stats.{color}
{color:#0747a6}Both Invalidate metadata & Refresh executes Successfully , but
Compute stats fail with Error (This could be due to stale metadata){color}
{color:#0747a6}We are loading 134 tables, this issue occurs for 2 or 3 random
tables at random days. {color}
{color:#0747a6}Interesting point is recently we upgraded to impala 3.2.0 and
we are facing this issue.{color}
{color:#0747a6}*In order to fix this issue , I have to run below commands in
impala-shell :*{color}
{color:#0747a6}ALTER TABLE <db_name>.<table_name> RENAME TO
<db_name>.<table_name>_stage;{color}
{color:#0747a6}CREATE TABLE <db_name>.<table_name> STORED AS PARQUET AS SELECT
* FROM <db_name>.<table_name>_stage;{color}
{color:#0747a6}drop table <db_name>.<table_name>_stage;{color}
{color:#0747a6}compute stats <db_name>.<table_name>;{color}
{color:#0747a6}Thanks for sharing info about IMPALA-8561 . i feel like we are
facing same issue( mentioned in workaround section).{color}
You could also help narrow down the issue by looking at the file size in "show
files in <tablename>" in Impala and comparing it to the file size in HDFS - it
will likely be different, which would explain why the version number - the last
few bytes in the file - doesn't match.
{color:#0747a6}As of now ,i Overwritten the table files.{color}
{color:#0747a6}I am sure , this sure might occur in 1 or 2 days so that next
time i will watch-out for file sizes in impala & HDFS{color}
{color:#0747a6} {color}
was (Author: kishore_k):
Thanks [~tarmstrong] for your reply . I updated my responses in blue font.
does "invalidate metadata <dbname.table_name>" help?
{color:#0747a6}I tried multiple times running Invalidate metadata and Refresh
<table_name> , it doesn't help.( Still getting Stale metadata error){color}
Would also be helpful to understand how that file was ingested, if files are
overwritten in-place there are more issues related to caching that are possible
(we've fixed all the known ones in later versions though). We had some related
issues due to IMPALA-8561
{color:#0747a6}Files are ingested from Hive Beeline.{color}
{color:#0747a6}As a part of our ETL ,we create a staging table and copy data
from stage table to final table using LOAD INPATH command (Overwritten
in-place),all these commands are executed in Hive Beeline.{color}
{color:#0747a6}We are using hive for our ETL process and impala for Dashboard
queries.{color}
{color:#0747a6}Once our ETL process is finished,we run below commands {color}
{color:#0747a6}1.Invalidate metadata {color}
{color:#0747a6}2.Refresh {color}
{color:#0747a6}3.Compute stats.{color}
{color:#0747a6}Both Invalidate metadata & Refresh executes Successfully , but
Compute stats fail with Error (This could be due to stale metadata){color}
{color:#0747a6}We are loading 134 tables, this issue occurs for 2 or 3 random
tables at random days. {color}
{color:#0747a6}Interesting point is recently we upgraded to impala 3.2.0 and
we are facing this issue.{color}
{color:#0747a6}*In order to fix this issue , I have to run below commands in
impala-shell :*{color}
{color:#0747a6}ALTER TABLE <db_name>.<table_name> RENAME TO
<db_name>.<table_name>_stage;{color}
{color:#0747a6}CREATE TABLE <db_name>.<table_name> AS SELECT * FROM
<db_name>.<table_name>_stage;{color}
{color:#0747a6}drop table <db_name>.<table_name>_stage;{color}
{color:#0747a6}compute stats <db_name>.<table_name>;{color}
{color:#0747a6}Thanks for sharing info about IMPALA-8561 . i feel like we are
facing same issue( mentioned in workaround section).{color}
You could also help narrow down the issue by looking at the file size in "show
files in <tablename>" in Impala and comparing it to the file size in HDFS - it
will likely be different, which would explain why the version number - the last
few bytes in the file - doesn't match.
{color:#0747a6}As of now ,i Overwritten the table files.{color}
{color:#0747a6}I am sure , this sure might occur in 1 or 2 days so that next
time i will watch-out for file sizes in impala & HDFS{color}
{color:#0747a6} {color}
> ERROR: File has an invalid version number: This could be due to stale
> metadata. Try running "refresh <Table_name>".
> --------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-10042
> URL: https://issues.apache.org/jira/browse/IMPALA-10042
> Project: IMPALA
> Issue Type: Bug
> Components: Backend, Catalog
> Affects Versions: Impala 3.2.0
> Reporter: kishorekumar kuruguntla
> Priority: Major
> Attachments: image-2020-08-03-10-31-37-405.png
>
>
> Hi,
> I am performing etl and loading Parquet table through Hive .
> Ran invalidate metadata <db_name.table_name> in impala.
> While trying to run "compute stats <db_name.table_name>" or running "Select
> count ( * ) from <db_name.table_name>" ,getting File has Invalid Version
> Number.
> *Full Error :*
> ERROR: File '<full_hdfs_path>' has an invalid version number: <Random_number>
> This could be due to stale metadata. Try running "refresh
> <db_name.table_name>".
>
> !image-2020-08-03-10-31-37-405.png!
>
> Tried running "refresh <db_name.table_name>" multiple times ,still "Select
> count ( * )" or "Compute stats " not working.
> I am able to run "Select count ( * ) from <db_name.table_name> " in hive
> beeline successfully ,but in impala it is not working.
>
> Seems like this issue is fixed in
> https://issues.apache.org/jira/browse/IMPALA-2477 ( impala 2.3.0 ).
> But still we are facing this issue in *version 3.2.0-cdh6.2.1*
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]