Nikolay Skovorodin created HUDI-8771:
----------------------------------------

             Summary: Incorrect classification of input paths in 
InputPathHandler
                 Key: HUDI-8771
                 URL: https://issues.apache.org/jira/browse/HUDI-8771
             Project: Apache Hudi
          Issue Type: Bug
          Components: hive
            Reporter: Nikolay Skovorodin
             Fix For: 1.1.0


In InputPathHandler.parseInputPaths method classification of input paths can go 
wrong for paths that are substrings of one another. 

Because of this check which finds previously resolved metaClient for inputPath
{code:java}
if (inputPath.toString().contains(metaClient.getBasePath().toString())) { {code}
 

For example for input paths
 * /tmp/junit1202353861292872173/raw_trips/2019/05/25
 * /tmp/junit1202353861292872173/raw_trips_cow/2019/05/25

and their respective base paths
 * {{/tmp/junit1202353861292872173/raw_trips}}
 * {{{}/tmp/junit1202353861292872173/{}}}{{{}raw_trips_cow{}}}{{{}{}}}

Because of String.contains both input paths are incorrectly matched to the 
`raw_trips` base path because /raw_trips is the substring of /raw_trips_cow.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to