> On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote: > > src/org/apache/pig/builtin/StreamUDFToPig.java, line 47 > > <https://reviews.apache.org/r/13781/diff/2/?file=348274#file348274line47> > > > > This might be affected by PIG-3255
Looks like it would be easy to make this patch work with PIG-3255. If we could get that merged shortly I'd update this patch with the necessary changes. > On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote: > > src/org/apache/pig/builtin/StreamUDFToPig.java, line 109 > > <https://reviews.apache.org/r/13781/diff/2/?file=348274#file348274line109> > > > > How do we handle delimit escaping? Seems we don't handle it right now? We don't handle it now. We use a 3 character delimiter which avoids collisions in practice. I would actually like to switch the serialization to a 'real' serialization library but we did a little bit of prototyping a few months ago and there wasn't anything out there that worked really well and had significant performance improvements. Avro was the most promising but on the Python side it was missing a number of features we needed. > On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote: > > src/org/apache/pig/builtin/StreamingUDF.java, line 169 > > <https://reviews.apache.org/r/13781/diff/2/?file=348276#file348276line169> > > > > Are those files only for illustrate? Why not just use a random tmp > > file? Why do we distinguish exectype in illustrate? It only run locally. I renamed the class to remove Illustrate from the name. This is actually used both for illustrate (where we capture the output only from the last run) and in real runs (where we write the output to the user logs). By putting it in the user logs you can get the logs from the task tracker web interface. We could use temp files but I like the more human readable names. > On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote: > > src/org/apache/pig/builtin/StreamingUDF.java, line 198 > > <https://reviews.apache.org/r/13781/diff/2/?file=348276#file348276line198> > > > > Any reason not to create a StreamingUDFInputHandler? I didn't really think it was necessary but I guess its more consistent. > On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote: > > src/org/apache/pig/builtin/StreamingUDF.java, line 366 > > <https://reviews.apache.org/r/13781/diff/2/?file=348276#file348276line366> > > > > Seems flush() is more proper to put inside inputHandler I'm not sure if it would be unnecessary for some clients of InputHandler. For the StreamingUDF we need to make sure we always flush or else we can get blocked on the reading process not getting the full input (and then never writing the output) but I'm not sure if its always necessary. > On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote: > > src/org/apache/pig/impl/streaming/InputHandler.java, line 66 > > <https://reviews.apache.org/r/13781/diff/2/?file=348282#file348282line66> > > > > Why we need this change? Probably just left over from debugging where I wanted to know the value of the byte[]. > On Sept. 7, 2013, 12:04 a.m., Daniel Dai wrote: > > test/unit-tests, line 73 > > <https://reviews.apache.org/r/13781/diff/2/?file=348302#file348302line73> > > > > This is not true unit test, should remove from the list Can it go in commit-tests? - Jeremy ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13781/#review25964 ----------------------------------------------------------- On Sept. 5, 2013, 8:23 p.m., Jeremy Karn wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/13781/ > ----------------------------------------------------------- > > (Updated Sept. 5, 2013, 8:23 p.m.) > > > Review request for pig. > > > Repository: pig-git > > > Description > ------- > > Changes for PIG-2417 (https://issues.apache.org/jira/browse/PIG-2417) > > > Diffs > ----- > > build.xml 7e22192 > src/org/apache/pig/PigToStream.java 7cc2950 > src/org/apache/pig/StreamToPig.java ff24b27 > src/org/apache/pig/builtin/PigStreaming.java 5467693 > src/org/apache/pig/builtin/PigToStreamUDF.java PRE-CREATION > src/org/apache/pig/builtin/StreamUDFToPig.java PRE-CREATION > src/org/apache/pig/builtin/StreamingDelimiters.java PRE-CREATION > src/org/apache/pig/builtin/StreamingUDF.java PRE-CREATION > src/org/apache/pig/builtin/StreamingUDFException.java PRE-CREATION > src/org/apache/pig/builtin/StreamingUDFOutputSchemaException.java > PRE-CREATION > src/org/apache/pig/impl/streaming/DefaultInputHandler.java 301bea3 > src/org/apache/pig/impl/streaming/DefaultOutputHandler.java 1b46e7d > src/org/apache/pig/impl/streaming/ExecutableManager.java cf79c83 > src/org/apache/pig/impl/streaming/InputHandler.java 690d94e > src/org/apache/pig/impl/streaming/OutputHandler.java 6e9262a > src/org/apache/pig/impl/streaming/StreamingUDFOutputHandler.java > PRE-CREATION > src/org/apache/pig/impl/streaming/StreamingUtil.java PRE-CREATION > src/org/apache/pig/impl/util/JarManager.java 5c4acb0 > src/org/apache/pig/impl/util/StorageUtil.java dcb62ec > src/org/apache/pig/scripting/ScriptEngine.java 29a9e1f > src/org/apache/pig/scripting/ScriptingIllustrateOutputCapturer.java > PRE-CREATION > src/org/apache/pig/scripting/streaming/python/PythonScriptEngine.java > PRE-CREATION > src/python/streaming/controller.py PRE-CREATION > src/python/streaming/pig_util.py PRE-CREATION > test/org/apache/pig/builtin/TestPigToStreamUDF.java PRE-CREATION > test/org/apache/pig/builtin/TestStreamUDFToPig.java PRE-CREATION > test/org/apache/pig/builtin/TestStreamingUDF.java PRE-CREATION > test/org/apache/pig/impl/streaming/TestExecutableManager.java 6246019 > test/org/apache/pig/impl/streaming/TestStreamingUDFOutputHandler.java > PRE-CREATION > test/org/apache/pig/impl/streaming/TestStreamingUtil.java PRE-CREATION > test/org/apache/pig/test/TestPigStreaming.java PRE-CREATION > test/org/apache/pig/test/TestStreaming.java 1eac5d2 > test/python/streaming/test_controller.py PRE-CREATION > test/unit-tests d52ad9d > > Diff: https://reviews.apache.org/r/13781/diff/ > > > Testing > ------- > > > Thanks, > > Jeremy Karn > >
