[
https://issues.apache.org/jira/browse/HUDI-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Wong updated HUDI-628:
-----------------------------
Description:
The [https://hudi.apache.org/docs/quick-start-guide.html] example data has a
column `partitionpath` which holds values like `americas/brazil/sao_paulo`.
Using the docker environment's spark-shell, you can change the basePath from
the quickstart to save to hdfs://user/hive/warehouse/hudi_trips_cow and write
the table. Then you can see the folder in the HDFS browser, similar to the
stock_ticks_cow folder created in the docker demo.
However, if you try to use run_sync_tool.sh to sync the table to Hive, you get
the error: "java.lang.IllegalArgumentException: Partition key parts
[partitionpath] does not match with partition values [americas, brazil,
sao_paulo]. Check partition strategy. "
{quote}{{/var/hoodie/ws/hudi-hive/run_sync_tool.sh --jdbc-url
jdbc:hive2://hiveserver:10000 --user hive --pass hive --partitioned-by
partitionpath --partition-value-extractor
org.apache.hudi.hive.MultiPartKeysValueExtractor -MultiPartKeysValueExtractor
-base-path /user/hive/warehouse/hudi_trips_cow --database default --table
hudi_trips_cow}}
{quote}
This error is thrown in `HoodieHiveClient.getPartitionClause`, which uses
`extractPartitionValuesInPath` to get a list of partitionValues. The problem is
that it compares the length of the partitionValues to the length of the
partitionField. In this example, there is only 1 partitionField,
"partitionpath," which is split into 3 partitionValues. Thus the check fails
and throws the exception.
See
[https://github.com/apache/incubator-hudi/blob/master/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java#L182]
was:
The [https://hudi.apache.org/docs/quick-start-guide.html] example data has a
column `partitionpath` which holds values like `americas/brazil/sao_paulo`.
Using the docker environment's spark-shell, you can change the basePath from
the quickstart to save to hdfs://user/hive/warehouse/hudi_trips_cow. Then you
can see the folder in the HDFS browser, similar to the stock_ticks_cow folder
created in the docker demo.
However, if you try to use run_sync_tool.sh to sync the table to Hive, you get
the error: "java.lang.IllegalArgumentException: Partition key parts
[partitionpath] does not match with partition values [americas, brazil,
sao_paulo]. Check partition strategy. "
{quote}{{/var/hoodie/ws/hudi-hive/run_sync_tool.sh --jdbc-url
jdbc:hive2://hiveserver:10000 --user hive --pass hive --partitioned-by
partitionpath --partition-value-extractor
org.apache.hudi.hive.MultiPartKeysValueExtractor -MultiPartKeysValueExtractor
-base-path /user/hive/warehouse/hudi_trips_cow --database default --table
hudi_trips_cow}}
{quote}
This error is thrown in `HoodieHiveClient.getPartitionClause`, which uses
`extractPartitionValuesInPath` to get a list of partitionValues. The problem is
that it compares the length of the partitionValues to the length of the
partitionField. In this example, there is only 1 partitionField,
"partitionpath," which is split into 3 partitionValues. Thus the check fails
and throws the exception.
See
[https://github.com/apache/incubator-hudi/blob/master/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java#L182]
> MultiPartKeysValueExtractor does not work with run_sync_tool.sh
> ---------------------------------------------------------------
>
> Key: HUDI-628
> URL: https://issues.apache.org/jira/browse/HUDI-628
> Project: Apache Hudi (incubating)
> Issue Type: Bug
> Reporter: Andrew Wong
> Priority: Major
> Attachments: stack_trace.txt
>
>
> The [https://hudi.apache.org/docs/quick-start-guide.html] example data has a
> column `partitionpath` which holds values like `americas/brazil/sao_paulo`.
> Using the docker environment's spark-shell, you can change the basePath from
> the quickstart to save to hdfs://user/hive/warehouse/hudi_trips_cow and write
> the table. Then you can see the folder in the HDFS browser, similar to the
> stock_ticks_cow folder created in the docker demo.
> However, if you try to use run_sync_tool.sh to sync the table to Hive, you
> get the error: "java.lang.IllegalArgumentException: Partition key parts
> [partitionpath] does not match with partition values [americas, brazil,
> sao_paulo]. Check partition strategy. "
> {quote}{{/var/hoodie/ws/hudi-hive/run_sync_tool.sh --jdbc-url
> jdbc:hive2://hiveserver:10000 --user hive --pass hive --partitioned-by
> partitionpath --partition-value-extractor
> org.apache.hudi.hive.MultiPartKeysValueExtractor -MultiPartKeysValueExtractor
> -base-path /user/hive/warehouse/hudi_trips_cow --database default --table
> hudi_trips_cow}}
> {quote}
> This error is thrown in `HoodieHiveClient.getPartitionClause`, which uses
> `extractPartitionValuesInPath` to get a list of partitionValues. The problem
> is that it compares the length of the partitionValues to the length of the
> partitionField. In this example, there is only 1 partitionField,
> "partitionpath," which is split into 3 partitionValues. Thus the check fails
> and throws the exception.
> See
> [https://github.com/apache/incubator-hudi/blob/master/hudi-hive/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java#L182]
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)