Hi Travis, Thanks a ton for this issue I know I will enjoy solving this (: So I have some questions about this jira even though I think I understand what the problem is.
- How do you think I should approach this? I mean if HCat can't send the partitions' information through the configuration object, maybe we should think on a different way of communicating this information (thrift, or the database)? - I was looking at HCatLoader but I am not sue if this would be a good entry point for the modifications. Any suggestions? Thanks again Travis! Renato M. 2012/8/30 Travis Crawford <[email protected]>: > You might be interested in https://issues.apache.org/jira/browse/HCATALOG-453 > > The issue here is HCatalog queries the HiveMetaStore for info about > the partitions to process, and stores that response in the job conf. > When processing large numbers of partitions this bloats the job conf > beyond what Hadoop will allow and the job fails. > > What's interesting about this issue is you'll learn about the main > feature of HCatalog - translating db+table+partition_spec into a list > of partitions, how HCat handles that internally, and how its > communicated between the frontend & backend. The actual issue is > straightforward, but I think spending the time to understand the > problem will give a great overview of how HCat works. > > Thoughts? > > --travis > > > > On Thu, Aug 30, 2012 at 4:25 PM, Renato Marroquín Mogrovejo > <[email protected]> wrote: >> Travis, >> >> Thanks a lot for your response! My master's dissertation was about >> using statistics to smarten up Apache Pig rule optimizer, so I would >> love to help out with something related, but maybe you can suggest me >> some interesting jiras (not complicated ones but maybe "noobies" ones) >> I can start with (: >> And yeah the labels thing is much better than creating a jura type for >> noobies. Thanks again! >> >> >> Renato M. >> >> 2012/8/30 Travis Crawford <[email protected]>: >>> Hey Renato - >>> >>> Awesome! What in particular are you interested in starting out with? >>> We can definitely find a starter project for you in that area. >>> >>> JIRA issues can have a variety of attributes; the attribute I started >>> this thread about is the "issue type". >>> >>> JIRA also has "labels", which I think are a great place to indicate >>> something would be good for noobies. For example, there could be an >>> "issue type" of bug, with "label" noobie. >>> >>> Let us know what area you're interested in diving into and we can help >>> come up with a starter project for ya. >>> >>> --travis >>> >>> >>> On Thu, Aug 30, 2012 at 9:21 AM, Renato Marroquín Mogrovejo >>> <[email protected]> wrote: >>>> Hi all, >>>> >>>> I am new to HCatalog but I would like to get involved with the >>>> project, and one thing that would totally help is to create an issue >>>> type that indicates it is for "newbies". I saw that in Apache Pig they >>>> have a special type of issue for this and with this they try to engage >>>> more with the community. This would be awesome guys! >>>> Thanks in advance! >>>> >>>> >>>> Renato M. >>>> >>>> 2012/8/30 Travis Crawford <[email protected]>: >>>>> Hey hcat gurus - >>>>> >>>>> Filing an issue just now I noticed the list of possible option types >>>>> is pretty crazy long - any objection to requesting a simplification >>>>> to: >>>>> >>>>> PROPOSED ISSUE TYPES: >>>>> >>>>> Bug - fixing unintended behavior >>>>> New Feature - addition of brand-new functionality >>>>> Improvement - making existing functionality better >>>>> >>>>> CURRENT ISSUE TYPES: >>>>> >>>>> Bug >>>>> New Feature >>>>> Improvement >>>>> Test >>>>> Wish >>>>> Task >>>>> New JIRA Project >>>>> RTC >>>>> TCK Challenge >>>>> Question >>>>> Temp >>>>> Brainstorming >>>>> Umbrella >>>>> Epic >>>>> Dependency upgrade >>>>> Suitable Name Search >>>>> >>>>> If this sounds good I'll ping the infra folks and try to make this happen. >>>>> >>>>> --travis
