GitHub user rahil-c edited a discussion: RFC-100: Lance File Format support in 
Hudi

## ✅ Lance File Format Integration Tasks

See the following feature for more context: 
https://github.com/apache/hudi/issues/14127

In regards to the following new feature for supporting unstructured data in 
Hudi via formats like Lance that are focused on AI/ML use cases. Here is the 
initial scope of what we are targeting(Note this list will continue to grow as 
we find get deeper within the integration, for now it aims to first support the 
Hudi Spark Client):

- [ ] Add base `HoodieFileWriter` for Lance with a Spark implementation, 
[PR](https://github.com/apache/hudi/pull/14131)
- [ ] Add base `HoodieFileReader` for Lance with a Spark implementation, 
[PR](https://github.com/apache/hudi/pull/14132)
- [ ] Add basic Avro → Arrow schema conversion, 
[PR](https://github.com/apache/hudi/pull/14132)
- [ ] Add `SparkColumnarFileReader` implementation for Lance , WIP
- [ ] Implement append-only validation (bulk insert), WIP
- [ ] Implement insert / upsert / delete validation, WIP
- [ ] Integrate Lance as a log file format  
- [ ] Add predicate (filter) push-down  
- [ ] Support `ColumnarBatch` vectorized reading

Will be making on the following open source feature branch: 
https://github.com/apache/hudi/tree/feature-branch-rfc100-unstructured-data

GitHub link: https://github.com/apache/hudi/discussions/14128

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to