[ 
https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13268205#comment-13268205
 ] 

Prashant Kommireddi commented on PIG-2600:
------------------------------------------

Thanks for the review Jon.

1. I agree on Exception handling in most cases, and thanks for catching 
"printStackTrace" in the code, it wasn't my intention to leave it in there :). 
In general wrapping specific portions of code within try-catch is a good 
practice, but I prefer not breaking up try blocks into multiple when most lines 
within the method throw the same exception, and its not a lot of code 
otherwise. In these UDFs, Schema.getField is used more than once and ofcourse 
they all throw FrontEndException. 

And looking at builtin UDFs for examples was really not the best idea :). Looks 
like some refactoring required there.

2. Regarding resizing of HashSet, trying to optimize right now might be a bit 
premature. My comment about frequent resizing would make sense if the number of 
distinct elements in Map values was large. A HashSet start with an internal 
array of size 16, and starts expanding (creating a new array and copying 
elements over) once a certain threshold is met. With the current approach, 
HashSet implementation guesses an approximate size of array based on size of 
the Collection being passed to it in the constructor. "it just adds all of the 
elements, and resizes dynamically as you add more of them" - this can be a 
costly operation if you start with 16 elements and the number of distinct 
values in the map was thousands, millions... "since that first pass, dynamic 
resizing and all, is going to happen anyway" - it should make sense that the 
amount of resizing is not the same in the 2 cases? Either way, too early to be 
thinking about optimization there :)

I will upload a patch soon with the changes, thanks for reviewing again.

                
> Better Map support
> ------------------
>
>                 Key: PIG-2600
>                 URL: https://issues.apache.org/jira/browse/PIG-2600
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Prashant Kommireddi
>             Fix For: 0.11
>
>         Attachments: PIG-2600.patch, PIG-2600_2.patch, PIG-2600_3.patch, 
> PIG-2600_4.patch, PIG-2600_5.patch
>
>
> It would be nice if Pig played better with Maps. To that end, I'd like to add 
> a lot of utility around Maps.
> - TOBAG should take a Map and output {(key, value)}
> - TOMAP should take a Bag in that same form and make a map.
> - KEYSET should return the set of keys.
> - VALUESET should return the set of values.
> - VALUELIST should return the List of values (no deduping).
> - INVERSEMAP would return a Map of values => the set of keys that refer to 
> that Key
> This would all be pretty easy. A more substantial piece of work would be to 
> make Pig support non-String keys (this is especially an issue since UDFs and 
> whatnot probably assume that they are all Integers). Not sure if it is worth 
> it.
> I'd love to hear other things that would be useful for people!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to