Hello Vinoth, As promised, here's a PR into 0.5.2 - I think it might be worth bringing that into master / 0.5.3 as well. But I figured I'd at least get this PR out there for someone to review. https://github.com/apache/incubator-hudi/pull/1597
For what it's worth, there were definitely some point points I encountered: * checkstyle.xml - not supported well at all in IntelliJ - downloaded a plugin and it didn't help - had to compile, find errors, rinse / repeat * There are libraries included in the pom.xml that for some reason are not allowed per the checkstyle.xml - doesn't make sense (org.apache.commons.*) - why have them in the project if you can't use them? * Would have thought everything was good to go after running unit tests, but when deploying to a real cluster, found that the entire class had to be serializable - would have been nice to know that before-hand as that would have saved several cycles. - probably worth documenting somewhere? * Don't really know where I should document these changes as I only found out how to do these things via Vinoth's original reply to my email - would be nice if there was some sort of "extending Hudi" documentation somewhere Hope this becomes useful for someone else. FYI - this is working perfectly for my use-case. Unit tests show several different approaches but I wouldn't mind throwing some documentation together to help folks out. Let me know if you need anything else to help move this along - surely I can't be the only one that needed it! :-) Allen On Tue, May 5, 2020 at 11:22 AM Vinoth Chandar <[email protected]> wrote: > Great! > > On Mon, May 4, 2020 at 5:43 PM Allen Underwood > <[email protected]> wrote: > > > Hi Vinoth, > > > > Yes I was going to set some things up in the morning. I’ll let you know > > how it turns out and if it’s worth a PR I’ll get one together. > > > > Thanks again for your help! > > > > Allen > > > > On Mon, May 4, 2020 at 8:40 PM Vinoth Chandar <[email protected]> wrote: > > > >> Thanks both! > >> > >> @allen heard this many times :) hear you. You could write a small class > >> yourself with your custom logic and throw it in there? > >> > >> If you think there is a way to fix the key generator in Hudi to be more > >> resilient to these (e.g taking in a list of supported patterns vs just > the > >> one), let us know. > >> > >> On Mon, May 4, 2020 at 3:08 PM Allen Underwood > >> <[email protected]> wrote: > >> > >> > Hi Vinoth - that was extremely helpful...I almost had it working, > >> HOWEVER, > >> > it appears I have dates that some have the ms on the end and others > >> > don't....so if I pick adding a time format with them, then the ones > >> without > >> > the fail and vice versa....Good times. > >> > > >> > After I figure this out I'll see if I can put this information > somewhere > >> > easy to find. > >> > > >> > On Mon, May 4, 2020 at 12:23 PM Vinoth Chandar <[email protected]> > >> wrote: > >> > > >> >> Hi Allen, > >> >> > >> >> You are able to configure the key generator for deltastreamer using > >> this > >> >> property (either via a file or --config ) > >> >> hoodie.datasource.write.keygenerator.class > >> >> > >> >> You might be interested in this built-in generator. > >> >> > >> >> > >> > https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedKeyGenerator.java#L64 > >> >> It takes let you configure a field as a recordKey, and if you can > parse > >> >> your timestamp using Java SimpleDateFormat, you can specify the > >> datetime > >> >> field and a pattern to parse it into.. > >> >> > >> >> Happy to make this work for you. > >> >> > >> >> community, any volunteers to faq/document this? :) > >> >> > >> >> > >> >> On Mon, May 4, 2020 at 9:11 AM Allen Underwood > >> >> <[email protected]> wrote: > >> >> > >> >> > I’ve tried to do my due diligence by googling / searching this > slack > >> and > >> >> > I’ve come up empty.Is there a way through configuration / > >> deltastreamer > >> >> > to extract a custom partition key?Basically I have a datetime field > >> in a > >> >> > Kafka Source that has an ISO8601 datetime….is there a way to > extract > >> a > >> >> > partition value out of that?I found this after some Googling, but > >> this > >> >> > seems like it’d only be useful if I wanted to write my own writer > >> >> > application: > >> >> > > >> >> > > >> >> > >> > https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.javaAny > >> >> > way to do what I need through configuration of the spark job / hudi > >> >> > configuration? > >> >> > > >> hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java > >> >> > < > >> >> > >> > https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java > >> >> > > >> >> > > >> >> > -- > >> >> > *Allen Underwood* > >> >> > > >> >> > >> > > >> > > >> > -- > >> > *Allen Underwood* > >> > Principal Software Engineer > >> > Broadcom | Symantec Enterprise Division > >> > *Mobile*: 404.808.5926 > >> > > >> > > -- > > *Allen Underwood* > > Principal Software Engineer > > Broadcom | Symantec Enterprise Division > > *Mobile*: 404.808.5926 > > > -- *Allen Underwood* Principal Software Engineer Broadcom | Symantec Enterprise Division *Mobile*: 404.808.5926
smime.p7s
Description: S/MIME Cryptographic Signature
