*Hi Everyone,* Earlier we have made some changes on https://github.com/sigmoidanalytics/spork/tree/spork-pig-12 to achieve complete e2e coverage but we couldn't restrict ourselves in making changes in pig codebase as we found it slightly easier to do.
We are now working on merging these changes to https://github.com/apache/pig/tree/spark and had to re-look into these changes, either find a workaround or propose the change on trunk. Below is the gist of code changes that are made out of Spark for which the related code can be found here <http://goo.gl/nRgldU> 1. Had to comment out PigStatsUtil.addNativeJobStats(PigStats.get(), this, true); to get native (mapred) operator working 2. Changes in PigRecordReader to identify endOfAllInput 3. POUserFunc - made properties attribute public 4. POCollectedGroup - getNextTuple modified to identify the end of all input 5. POFRJoin - made LRs attribute public to use it during FR join 6. POMergeJoin - made LRs attribute public to use it during merge join 7. POStream - problem with identifying endOfAllInput, made some changes 8. JsonLoader - made properties public to use from JsonStorage 9. JsonStorage - uses properties from JsonLoader 10. PigStorage - mRequiredColumns attribute 11. BinSedesTuple, BinSedesTupleFactory - made the class serializable 12. SchemaTupleBackend - changes to initialize stbInstance when null Would like to seek upfront suggestions before I submit the related patches and take the discussion on a issue basis. BW, below are the jira issues relating above changes which I would be working on. Please feel free to comment on the issue whoever is interested in taking them up. PIG-4193, PIG-4189, PIG-4190, PIG-4192, PIG-4200, PIG-4207, PIG-4208, PIG-4209 Thanks, Praveen R
