Thanks! Will review and get back to you On Tue, Jun 2, 2020 at 10:37 AM Shiyan Xu <[email protected]> wrote:
> Thank you for the feedback, Vinoth. Agreed with your points. Also created a > small RFC for easy alignment on the changes > > https://cwiki.apache.org/confluence/display/HUDI/RFC+-+20+%3A+handle+failed+records > > On Sun, May 24, 2020 at 12:06 AM Vinoth Chandar <[email protected]> wrote: > > > Hi Raymond, > > > > Thanks for starting this discussion. > > > > Agree on 1.. (we may also need some CLI support for inspecting bad/record > > and also code samples to consume them etc?) > > > > On 2, these place seem appropriate. We can figure it out, in more detail > > when we get to implementation? > > > > On 3. +1 on logs.. We should also define a standard schema for error > > record.. I see some tricky issues to handle here, for schema mismatch > > errors. For e.g if the core problem was schema mismatch, then > > serializing/deserializing the error record without a working schema > > specific to that record may not be possible? May be we need the record > data > > itself in some format like json, that is schemaless? > > I also wonder if we should write the error table as another internal > > HoodieTable (we are abstracting out HoodieTable, FileGroupIO etc anyway)? > > > > On 4, +1 again. > > > > On Fri, May 22, 2020 at 7:47 PM Shiyan Xu <[email protected]> > > wrote: > > > > > Hi all, > > > > > > I'd like to bring up this discussion around handling errors in Hudi > write > > > paths. > > > https://issues.apache.org/jira/browse/HUDI-648 > > > > > > Trying to gather some feedbacks about the implementation details > > > 1. Error location > > > I'm thinking of writing the failed records to `.hoodie/errors/` for > > > a) encapsulate data within the Hudi table for ease of management > > > b) make use of existing dedicated directory > > > > > > 2. Write path > > > org.apache.hudi.client.HoodieWriteClient#postWrite > > > org.apache.hudi.client.HoodieWriteClient#completeCompaction > > > These 2 methods should be the places to persist failed records in > > > `org.apache.hudi.table.action.HoodieWriteMetadata#writeStatuses` > > > to the designated location > > > > > > 3. Format > > > Records should be written as logs (avro) > > > > > > 4. Metric > > > Post writing failed records, it should send a metric of basic count of > > > errors written. Easier for monitoring system to pick up and send alert. > > > > > > Foreseeably, some details may need to be adjusted throughout the > > > development. To begin with, we may agree on a feasible plan at high > > level. > > > > > > Please kindly share thoughts and feedbacks. Thank you. > > > > > > > > > > > > Regards, > > > Raymond > > > > > >
