xushiyan commented on code in PR #5647: URL: https://github.com/apache/hudi/pull/5647#discussion_r878807563
########## website/docs/hoodie_deltastreamer.md: ########## @@ -303,6 +303,11 @@ other formats and then write data as Hudi format.) - ORC - HUDI +For DFS sources the following behaviors are expected: + +- For JSON file format you always need to inform a schema. If the target hudi table follows the same schema from the source file, you just need to inform the schema for source, if don't you need to inform schemas for both. +- `HoodieDeltaStreamer` reads the files under the source path (`hoodie.deltastreamer.source.dfs.root`) directly, so you should not expect the tool to recognize partitions under this path as fields of the dataset. Detailed examples can be found [here](https://github.com/apache/hudi/issues/5485). Review Comment: ```suggestion - `HoodieDeltaStreamer` reads the files under the source base path (`hoodie.deltastreamer.source.dfs.root`) directly, and it won't use the partition paths under this base path as fields of the dataset. Detailed examples can be found [here](https://github.com/apache/hudi/issues/5485). ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
