[jira] [Commented] (HIVE-27071) Select query with LIMIT clause can fail if there are marker files like "_SUCCESS" and "_MANIFEST"

Taraka Rama Rao Lethavadla (Jira) Sun, 12 Feb 2023 23:17:27 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-27071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17687769#comment-17687769
 ]


Taraka Rama Rao Lethavadla commented on HIVE-27071:
---------------------------------------------------

In addition to what is reported already, how about providing a regex support in 
query to skip the files matching the regex while running the query. One 
advantage with this is that we can skip too many unwanted files that are not 
relevant to hive every time a query is run

> Select query with LIMIT clause can fail if there are marker files like 
> "_SUCCESS" and "_MANIFEST"
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27071
>                 URL: https://issues.apache.org/jira/browse/HIVE-27071
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 4.0.0
>            Reporter: Sai Hemanth Gantasala
>            Priority: Major
>
> Spark clients creates marker files like "_SUCCESS" and "_MANIFEST" under the 
> table/partition path at the end of a write operation. For example 
> 'hdfs://name-node-host/table/partition/_SUCCESS'
> Whenever Hive is trying to read that table with the LIMIT clause, it could to 
> the following error:
> {code:java}
> ERROR : Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1676095298574_0017_2_00, diagnostics=[Vertex 
> vertex_1676095298574_0017_2_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: trade initializer failed, 
> vertex=vertex_1676095298574_0017_2_00 [Map 1], 
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: 
> hdfs://name-node-host/table/partition/_MANIFEST
> Input path does not exist: hdfs://name-node-host/table/partition/_SUCCESS at 
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:300)
> at 
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:240)
> at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:328)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:579)
>  {code}
> Hive execution engine should ignore these marker files while reading the 
> table/partition data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-27071) Select query with LIMIT clause can fail if there are marker files like "_SUCCESS" and "_MANIFEST"

Reply via email to