Nikolay Skovorodin created HUDI-8771:
----------------------------------------
Summary: Incorrect classification of input paths in
InputPathHandler
Key: HUDI-8771
URL: https://issues.apache.org/jira/browse/HUDI-8771
Project: Apache Hudi
Issue Type: Bug
Components: hive
Reporter: Nikolay Skovorodin
Fix For: 1.1.0
In InputPathHandler.parseInputPaths method classification of input paths can go
wrong for paths that are substrings of one another.
Because of this check which finds previously resolved metaClient for inputPath
{code:java}
if (inputPath.toString().contains(metaClient.getBasePath().toString())) { {code}
For example for input paths
* /tmp/junit1202353861292872173/raw_trips/2019/05/25
* /tmp/junit1202353861292872173/raw_trips_cow/2019/05/25
and their respective base paths
* {{/tmp/junit1202353861292872173/raw_trips}}
* {{{}/tmp/junit1202353861292872173/{}}}{{{}raw_trips_cow{}}}{{{}{}}}
Because of String.contains both input paths are incorrectly matched to the
`raw_trips` base path because /raw_trips is the substring of /raw_trips_cow.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)