[jira] (HUDI-1527) Automatically infer the data directory, users only need to specify the table directory

Raymond Xu (Jira) Sat, 12 Mar 2022 06:17:09 -0800


    [ https://issues.apache.org/jira/browse/HUDI-1527 ]



    Raymond Xu deleted comment on HUDI-1527:
    ----------------------------------

was (Author: githubbot):
rmahindra123 opened a new pull request #3353:
URL: https://github.com/apache/hudi/pull/3353


   ## What is the purpose of the pull request
   
   This is a PR for the changes that were done in #2475, with following fixes:
   1. The actual logic is fixed to ensure it works
   2. Code re-structure / cleanup and decouple the logic to detect full table 
reads from the current logic
   3. Added new tests, and ensure tests actually verify the outputs
   
   Original PR description: To read the hudi table, you need to specify the 
path, but the path is not only the tablePath corresponding to the table, but 
needs to be determined by the partition directory structure. Different 
keyGenerators correspond to different partition directory structures. The 
first-level partition directory uses path=.../table/*/*, the secondary 
partition directory path=.../table/*/*/*，so it is troublesome to let the user 
specify the data path, the user only needs to specify the tablePath: .../table
   
   At the same time, after reading the hudi table by configuring 
path=.../table, it is more convenient to use sparksql to query the hudi table. 
You only need to add tab properties to the hive table metadata: 
spark.sql.sources.provider= hudi, you can automatically convert the hive table 
to the hudi table.
   
   ## Brief change log
   
   Added logic in `createRelation()` method in `DefaultSource` to detect when a 
user only specifies the full table path and not a blob when reading the entire 
table. If detected correctly, the path is rewritten automatically as a blob 
that will ensure the rest of the logic works as before.
   
   ## Verify this pull request
   
   Added unit and functional tests to cover all cases: no partition, single 
partition, date partition, and custom partition/ key generation.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Automatically infer the data directory, users only need to specify the table 
> directory
> --------------------------------------------------------------------------------------
>
>                 Key: HUDI-1527
>                 URL: https://issues.apache.org/jira/browse/HUDI-1527
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: teeyog
>            Priority: Major
>              Labels: pull-request-available
>
> To read the hudi table, you need to specify the path, but the path is not 
> only the tablePath corresponding to the table, but needs to be determined by 
> the partition directory structure. Different keyGenerators correspond to 
> different partition directory structures. The first-level partition directory 
> uses path=.../table/*/*, the secondary partition directory path=../table/*/*/*
> so it is troublesome to let the user specify the data path, the user only 
> needs to specify the tablePath:  .../table
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] (HUDI-1527) Automatically infer the data directory, users only need to specify the table directory

Reply via email to