Hi Russell,

Based on the email discussions, I wrote my proposal of this pig visualizer
project and submit it  onto google-melange. Please take a look at it at
your convenience, and it would also appreciated a lot if further
feedback/comments could be provided.

Thank you very much.
Best,

On Sun, Mar 25, 2012 at 9:25 PM, Russell Jurney <russell.jur...@gmail.com>wrote:

> I suggest you create a simple, minimal web application that visualizes a
> pig script file each time a url with the script filename is loaded.
>
> For instance, the process to use the tool might go like this:
>
> 1) Run pigvisualizer.(pl/py/rb) locally, at the start of your pig work
> session
> 2) Create a new pig script at /my/dif/filename.pig
> 3) Open http://localhost:4567/pigviz/my/dir/filename.pig in a web browser
> 4) See a javascript-based visualization of your pig script
> 5) Reload this web page each time you want to see a new visualization OR
> have to page try to reload the file periodically
>
> There are several sources of data:
>
> 1) Start a pig session, via grunt,PigServer or HCatalog, and use
> ILLUSTRATE/EXPLAIN.  An old example of doing this is available at
> https://github.com/rjurney/Cloud-Stenography
> 2) Use the explain or -dot commands from pig command line. In looking at
> the dot output, the graph is not as helpful as I had thought :(
> 3) Use the PigPen code to get ILLUSTRATE data for visualization
>
> The ideal situation is that you get the data plan via EXPLAIN, and sample
> data via ILLUSTRATE, and combine them to produce an even better version of
> figure 2 in the paper
> http://infolab.stanford.edu/~olston/publications/sigmod09.pdf
>
> [image: Inline image 1]
>
> As to the presentation of the data in an interface, I suggest you AVOID
> eclipse and the UI code to PigPen, as there is little utility in having
> this visualization there.  Not all Pig users use Eclipse, and there is
> little utility in editing scripts in the diagrams.  There is great utility
> in visualizing, understanding and debugging this way, but not so much in
> editing.
>
> On the other hand, anyone can edit Pig in their favorite tool and view
> their pig graph in a simple web application on their localhost by directing
> a web browser at it.  This is why a simple, small web application seems
> best. You can use ruby/sinatra or python/bottle/flask or perl/catalyst to
> make a simple web app.  Check out sigma.js for graph visualization:
> http://sigmajs.org/examples.html or http://neyric.github.com/wireit/ for
> something more fully featured.
>
> Perhaps the best plan is to fix ILLUSTRATE (see
> http://wiki.apache.org/pig/ExampleGenerator and talk to the guys at
> mortardata.com who have a patch for this), and edit the PigPen code to
> remove the Eclipse dependencies and have it output simple JSON for a web
> application to consume.  It could write to a file, or you could create a
> simple web service that publishes JSON for the current pig session.
>
> Once we have JSON of ILLUSTRATE... getting a web visualization is easy.  I
> can help, I've done it before in Cloud Stenography by parsing data in
> Grunt.  Which you could do, btw.  Old Perl code is available on github (see
> above link).
>
> Interested in thoughts of others.
>
> On Fri, Mar 23, 2012 at 11:21 PM, Shasha Liu <grassons...@gmail.com>wrote:
>
>> Hi Daniel,
>>
>> Thanks a lot for the reply.
>> I installed the latest Pig and read through the book of "programming in
>> pig".
>> I manged to use "-dot -out filename" to produce three graphs in dot file
>> format.
>>
>> Based on the existing dot file, my next question is what is the
>> requirement regarding a better visualizer?
>> Are we going to generate a picture (e.g., .png) for different plans
>> (logical plan, physical plan, map reduce plan), or provide a web interface
>> to visualize these graphs of plans?
>>
>> Best regards,
>>  --
>> Shasha(Amy) Liu
>>
>>
>> On Sun, Mar 18, 2012 at 3:30 AM, Daniel Dai <da...@hortonworks.com>wrote:
>>
>>> See comments inline.
>>>
>>> On Sat, Mar 17, 2012 at 6:52 AM, grassonsand <grassons...@gmail.com>
>>> wrote:
>>> > Dear all,
>>> >
>>> > I am a Ph.D. student in Computer Science and have 4-year Java
>>> programming
>>> > experience focusing on Java Web development.
>>> > In the candidate projects in PIG, I am interested in PIG-2586 (A better
>>> > plan/data flow visualizer) and PIG-2599 (Mavenize Pig).
>>> >
>>> > In my on-going research project, I am in charge of (1). web user
>>> interface
>>> > development and (2). build system. Now I am working on adding hadoop
>>> > capability to the project. The main reason I am interested in the PIG
>>> > project is that I can make a contribution to the PIG community based
>>> on my
>>> > previous experience,  and learn from the participant in GSoC this year
>>> and
>>> > benefit my on-going research project at the same time.
>>> >
>>> > (1). User interface development
>>> > I have used several graphic libraries to visualize semantic data and
>>> our own
>>> > data set, e.g., Jung, graphviz, BIRT, and several plot plugins in
>>> jquery.
>>> > Therefore, I am interested in working on a new tool for PIG visualizer.
>>> > After looking through the bug issue, I have several questions:
>>> >    (i) As both swing and javascript are mentioned, is this project a
>>> web or
>>> > standalone application?
>>> >    (ii) As ruby-graphviz is included, Is ruby required for this
>>> project?
>>>
>>> I envision two visualize components in Pig. One is a lightweight
>>> visualizer invoked by Grunt, which should be fast and concise, and
>>> integrated into explain command. The other is a standalone composer
>>> similar to PigPen, which should be much powerful. PIG-2586 is intended
>>> to track the first, but Russel's comment is talking about the second.
>>> Both are acceptable as a GSoC project. I leave it to Russel.
>>>
>>> >
>>> > (2). Build system
>>> > The code base of my research project is 40K loc and the build script
>>> was
>>> > written in Ant. Part of my duty is to convert the ant build script to
>>> maven
>>> > and maintain the build script. Therefore, Mavenize Pig is of interest
>>> to me
>>> > too. The build.xml in PIG project is more complicated than the one I
>>> worked
>>> > before. It includes ant, maven and ivy. Do we need to use maven to do
>>> all
>>> > the tasks and get rid of all the dependency on ant, maven and ivy?
>>>
>>> Yes
>>>
>>> >
>>> >  Best regards
>>> >  Shasha(Amy) Liu
>>>
>>
>>
>>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.
> com
>



-- 
Shasha(Amy) Liu

Reply via email to