[GitHub] [hudi] nbalajee opened a new pull request #2344: [HUDI-1470] In the hudi-test-suite, use the latest writer schema, when reading from existing parquet files

GitBox Thu, 17 Dec 2020 16:27:15 -0800


nbalajee opened a new pull request #2344:
URL: https://github.com/apache/hudi/pull/2344



   
   ## What is the purpose of the pull request
   When hudi-test-suite is reading records from the existing parquet files, it 
is using the reader schema (original schema used to write the parquet file).  
Hudi writer core uses the evolved/latest writer schema when reading the 
existing parquet files.  (Difference is visible only when testing schema 
evolution, with an evolved schema).
   
   To mimic the Hudi writer core behavior, with this change latest writer 
schema is used for reading the parquet files.
   
   ## Brief change log
   
   - Modified DFSHoodieDatasetInputReader, readParquetOrLogFiles() to use 
writer schema.
   - 
   ## Verify this pull request
   
   This pull request is already covered by existing tests, such as 
testSimpleHoodieDatasetReader().
   
   ## Committer checklist
   
    - [ x] Has a corresponding JIRA in PR title & commit
    
    - [x ] Commit message is descriptive of the change
    
    - [x ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nbalajee opened a new pull request #2344: [HUDI-1470] In the hudi-test-suite, use the latest writer schema, when reading from existing parquet files

Reply via email to