Hi Raymond, Thanks for starting this discussion.
Agree on 1.. (we may also need some CLI support for inspecting bad/record and also code samples to consume them etc?) On 2, these place seem appropriate. We can figure it out, in more detail when we get to implementation? On 3. +1 on logs.. We should also define a standard schema for error record.. I see some tricky issues to handle here, for schema mismatch errors. For e.g if the core problem was schema mismatch, then serializing/deserializing the error record without a working schema specific to that record may not be possible? May be we need the record data itself in some format like json, that is schemaless? I also wonder if we should write the error table as another internal HoodieTable (we are abstracting out HoodieTable, FileGroupIO etc anyway)? On 4, +1 again. On Fri, May 22, 2020 at 7:47 PM Shiyan Xu <[email protected]> wrote: > Hi all, > > I'd like to bring up this discussion around handling errors in Hudi write > paths. > https://issues.apache.org/jira/browse/HUDI-648 > > Trying to gather some feedbacks about the implementation details > 1. Error location > I'm thinking of writing the failed records to `.hoodie/errors/` for > a) encapsulate data within the Hudi table for ease of management > b) make use of existing dedicated directory > > 2. Write path > org.apache.hudi.client.HoodieWriteClient#postWrite > org.apache.hudi.client.HoodieWriteClient#completeCompaction > These 2 methods should be the places to persist failed records in > `org.apache.hudi.table.action.HoodieWriteMetadata#writeStatuses` > to the designated location > > 3. Format > Records should be written as logs (avro) > > 4. Metric > Post writing failed records, it should send a metric of basic count of > errors written. Easier for monitoring system to pick up and send alert. > > Foreseeably, some details may need to be adjusted throughout the > development. To begin with, we may agree on a feasible plan at high level. > > Please kindly share thoughts and feedbacks. Thank you. > > > > Regards, > Raymond >
