Ah no problem at all.. We are all here to make it better :) Let me spend sometime over the weekend on the PR and these suggestions and see if I can make some changes upstream
On Thu, May 7, 2020 at 6:44 AM Allen Underwood <[email protected]> wrote: > Hi Vinoth, > > Thank you for the reply! I'll go ahead and retarget the PR for master. > > On the checkstyle plugin, I tried to set it up and must have misconfigured > something because it never showed up in the IDE. I'm sure it was me being > in a rush to make progress so I bailed when I didn't get immediate results > :-) > > The commons jar is in the project POM in the release-0.5.2 branch. > > And on the Serializable - basically everything checks out fine in Unit > tests and all that. But apparently the Spark applications that use it > heavily leans on serializable classes, so if you had any instance variables > that are not serializable, you'd get runtime errors - basically I'd tried > to keep a DateTimeFormatter instance variable and it only blew up when > running as a spark job. The interesting thing was I tried to make it a > transient variable and that didn't work either, which leads me to believe > that the spark jobs were actually deserializing those variables to use > later on - it was a nice learning curve for me. > > And definitely not throwing stones on the documentation - documentation is > not easy... > > Thanks again! > > Allen > > On Thu, May 7, 2020 at 2:40 AM Vinoth Chandar < > [email protected]> wrote: > >> Hi Allen, >> >> Thanks for the valuable feedback! Can we retarget this to master.. and the >> 0.5.3 RM can backport it on top of 0.5.2.. >> >> Sharing what I know on these points. >> >> - Once you install the checkstyle plugin, you can setup IntelliJ to use >> checkstyle file as the code style and that has been working fairly well >> for >> me atleast. >> - Hmmm. master does not have commons .. did you use checkstyle from >> master >> or 0.5.1? We disallowed some libraries like apache commons/guava since >> they >> cause jar/class mismatches a lot when integrating into all these query >> engines :) >> - Agree.. `public abstract class KeyGenerator implements Serializable >> should >> have taken care of it, I would think so. You are referring to the >> KeyGenerator impl, right >> - Docs are definitely being worked on. Pratyaksh has the JIRA assigned for >> now I think. IMO we can add this to `writing data` page. >> >> >> >> >> On Wed, May 6, 2020 at 12:20 PM Allen Underwood >> <[email protected]> wrote: >> >> > Hello Vinoth, >> > >> > As promised, here's a PR into 0.5.2 - I think it might be worth bringing >> > that into master / 0.5.3 as well. But I figured I'd at least get this >> PR >> > out there for someone to review. >> > https://github.com/apache/incubator-hudi/pull/1597 >> > >> > For what it's worth, there were definitely some point points I >> encountered: >> > * checkstyle.xml - not supported well at all in IntelliJ - downloaded a >> > plugin and it didn't help - had to compile, find errors, rinse / repeat >> > * There are libraries included in the pom.xml that for some reason are >> not >> > allowed per the checkstyle.xml - doesn't make sense >> (org.apache.commons.*) >> > - why have them in the project if you can't use them? >> > * Would have thought everything was good to go after running unit tests, >> > but when deploying to a real cluster, found that the entire class had >> to be >> > serializable - would have been nice to know that before-hand as that >> would >> > have saved several cycles. - probably worth documenting somewhere? >> > * Don't really know where I should document these changes as I only >> found >> > out how to do these things via Vinoth's original reply to my email - >> would >> > be nice if there was some sort of "extending Hudi" documentation >> somewhere >> > >> > Hope this becomes useful for someone else. FYI - this is working >> > perfectly for my use-case. Unit tests show several different approaches >> > but I wouldn't mind throwing some documentation together to help folks >> out. >> > >> > Let me know if you need anything else to help move this along - surely I >> > can't be the only one that needed it! :-) >> > >> > Allen >> > >> > On Tue, May 5, 2020 at 11:22 AM Vinoth Chandar <[email protected]> >> wrote: >> > >> >> Great! >> >> >> >> On Mon, May 4, 2020 at 5:43 PM Allen Underwood >> >> <[email protected]> wrote: >> >> >> >> > Hi Vinoth, >> >> > >> >> > Yes I was going to set some things up in the morning. I’ll let you >> know >> >> > how it turns out and if it’s worth a PR I’ll get one together. >> >> > >> >> > Thanks again for your help! >> >> > >> >> > Allen >> >> > >> >> > On Mon, May 4, 2020 at 8:40 PM Vinoth Chandar <[email protected]> >> >> wrote: >> >> > >> >> >> Thanks both! >> >> >> >> >> >> @allen heard this many times :) hear you. You could write a small >> class >> >> >> yourself with your custom logic and throw it in there? >> >> >> >> >> >> If you think there is a way to fix the key generator in Hudi to be >> more >> >> >> resilient to these (e.g taking in a list of supported patterns vs >> just >> >> the >> >> >> one), let us know. >> >> >> >> >> >> On Mon, May 4, 2020 at 3:08 PM Allen Underwood >> >> >> <[email protected]> wrote: >> >> >> >> >> >> > Hi Vinoth - that was extremely helpful...I almost had it working, >> >> >> HOWEVER, >> >> >> > it appears I have dates that some have the ms on the end and >> others >> >> >> > don't....so if I pick adding a time format with them, then the >> ones >> >> >> without >> >> >> > the fail and vice versa....Good times. >> >> >> > >> >> >> > After I figure this out I'll see if I can put this information >> >> somewhere >> >> >> > easy to find. >> >> >> > >> >> >> > On Mon, May 4, 2020 at 12:23 PM Vinoth Chandar <[email protected] >> > >> >> >> wrote: >> >> >> > >> >> >> >> Hi Allen, >> >> >> >> >> >> >> >> You are able to configure the key generator for deltastreamer >> using >> >> >> this >> >> >> >> property (either via a file or --config ) >> >> >> >> hoodie.datasource.write.keygenerator.class >> >> >> >> >> >> >> >> You might be interested in this built-in generator. >> >> >> >> >> >> >> >> >> >> >> >> >> >> https://github.com/apache/incubator-hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedKeyGenerator.java#L64 >> >> >> >> It takes let you configure a field as a recordKey, and if you can >> >> parse >> >> >> >> your timestamp using Java SimpleDateFormat, you can specify the >> >> >> datetime >> >> >> >> field and a pattern to parse it into.. >> >> >> >> >> >> >> >> Happy to make this work for you. >> >> >> >> >> >> >> >> community, any volunteers to faq/document this? :) >> >> >> >> >> >> >> >> >> >> >> >> On Mon, May 4, 2020 at 9:11 AM Allen Underwood >> >> >> >> <[email protected]> wrote: >> >> >> >> >> >> >> >> > I’ve tried to do my due diligence by googling / searching this >> >> slack >> >> >> and >> >> >> >> > I’ve come up empty.Is there a way through configuration / >> >> >> deltastreamer >> >> >> >> > to extract a custom partition key?Basically I have a datetime >> >> field >> >> >> in a >> >> >> >> > Kafka Source that has an ISO8601 datetime….is there a way to >> >> extract >> >> >> a >> >> >> >> > partition value out of that?I found this after some Googling, >> but >> >> >> this >> >> >> >> > seems like it’d only be useful if I wanted to write my own >> writer >> >> >> >> > application: >> >> >> >> > >> >> >> >> > >> >> >> >> >> >> >> >> >> >> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.javaAny >> >> >> >> > way to do what I need through configuration of the spark job / >> >> hudi >> >> >> >> > configuration? >> >> >> >> > >> >> >> >> >> >> hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java >> >> >> >> > < >> >> >> >> >> >> >> >> >> >> https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java >> >> >> >> > >> >> >> >> > >> >> >> >> > -- >> >> >> >> > *Allen Underwood* >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > *Allen Underwood* >> >> >> > Principal Software Engineer >> >> >> > Broadcom | Symantec Enterprise Division >> >> >> > *Mobile*: 404.808.5926 >> >> >> > >> >> >> >> >> > -- >> >> > *Allen Underwood* >> >> > Principal Software Engineer >> >> > Broadcom | Symantec Enterprise Division >> >> > *Mobile*: 404.808.5926 >> >> > >> >> >> > >> > >> > -- >> > *Allen Underwood* >> > Principal Software Engineer >> > Broadcom | Symantec Enterprise Division >> > *Mobile*: 404.808.5926 >> > >> > > > -- > *Allen Underwood* > Principal Software Engineer > Broadcom | Symantec Enterprise Division > *Mobile*: 404.808.5926 >
