Re: [GSoC 2012] Self Introduction and interested projects

Russell Jurney Sun, 25 Mar 2012 18:25:39 -0700

I suggest you create a simple, minimal web application that visualizes a
pig script file each time a url with the script filename is loaded.

For instance, the process to use the tool might go like this:

1) Run pigvisualizer.(pl/py/rb) locally, at the start of your pig work
session
2) Create a new pig script at /my/dif/filename.pig
3) Open http://localhost:4567/pigviz/my/dir/filename.pig in a web browser
4) See a javascript-based visualization of your pig script
5) Reload this web page each time you want to see a new visualization OR
have to page try to reload the file periodically

There are several sources of data:

1) Start a pig session, via grunt,PigServer or HCatalog, and use
ILLUSTRATE/EXPLAIN.  An old example of doing this is available at
https://github.com/rjurney/Cloud-Stenography
2) Use the explain or -dot commands from pig command line. In looking at
the dot output, the graph is not as helpful as I had thought :(
3) Use the PigPen code to get ILLUSTRATE data for visualization

The ideal situation is that you get the data plan via EXPLAIN, and sample
data via ILLUSTRATE, and combine them to produce an even better version of
figure 2 in the paper
http://infolab.stanford.edu/~olston/publications/sigmod09.pdf

[image: Inline image 1]

As to the presentation of the data in an interface, I suggest you AVOID
eclipse and the UI code to PigPen, as there is little utility in having
this visualization there.  Not all Pig users use Eclipse, and there is
little utility in editing scripts in the diagrams.  There is great utility
in visualizing, understanding and debugging this way, but not so much in
editing.

On the other hand, anyone can edit Pig in their favorite tool and view
their pig graph in a simple web application on their localhost by directing
a web browser at it.  This is why a simple, small web application seems
best. You can use ruby/sinatra or python/bottle/flask or perl/catalyst to
make a simple web app.  Check out sigma.js for graph visualization:
http://sigmajs.org/examples.html or http://neyric.github.com/wireit/ for
something more fully featured.

Perhaps the best plan is to fix ILLUSTRATE (see
http://wiki.apache.org/pig/ExampleGenerator and talk to the guys at
mortardata.com who have a patch for this), and edit the PigPen code to
remove the Eclipse dependencies and have it output simple JSON for a web
application to consume.  It could write to a file, or you could create a
simple web service that publishes JSON for the current pig session.

Once we have JSON of ILLUSTRATE... getting a web visualization is easy.  I
can help, I've done it before in Cloud Stenography by parsing data in
Grunt.  Which you could do, btw.  Old Perl code is available on github (see
above link).

Interested in thoughts of others.

On Fri, Mar 23, 2012 at 11:21 PM, Shasha Liu <grassons...@gmail.com> wrote:

> Hi Daniel,
>
> Thanks a lot for the reply.
> I installed the latest Pig and read through the book of "programming in
> pig".
> I manged to use "-dot -out filename" to produce three graphs in dot file
> format.
>
> Based on the existing dot file, my next question is what is the
> requirement regarding a better visualizer?
> Are we going to generate a picture (e.g., .png) for different plans
> (logical plan, physical plan, map reduce plan), or provide a web interface
> to visualize these graphs of plans?
>
> Best regards,
> --
> Shasha(Amy) Liu
>
>
> On Sun, Mar 18, 2012 at 3:30 AM, Daniel Dai <da...@hortonworks.com> wrote:
>
>> See comments inline.
>>
>> On Sat, Mar 17, 2012 at 6:52 AM, grassonsand <grassons...@gmail.com>
>> wrote:
>> > Dear all,
>> >
>> > I am a Ph.D. student in Computer Science and have 4-year Java
>> programming
>> > experience focusing on Java Web development.
>> > In the candidate projects in PIG, I am interested in PIG-2586 (A better
>> > plan/data flow visualizer) and PIG-2599 (Mavenize Pig).
>> >
>> > In my on-going research project, I am in charge of (1). web user
>> interface
>> > development and (2). build system. Now I am working on adding hadoop
>> > capability to the project. The main reason I am interested in the PIG
>> > project is that I can make a contribution to the PIG community based on
>> my
>> > previous experience,  and learn from the participant in GSoC this year
>> and
>> > benefit my on-going research project at the same time.
>> >
>> > (1). User interface development
>> > I have used several graphic libraries to visualize semantic data and
>> our own
>> > data set, e.g., Jung, graphviz, BIRT, and several plot plugins in
>> jquery.
>> > Therefore, I am interested in working on a new tool for PIG visualizer.
>> > After looking through the bug issue, I have several questions:
>> >    (i) As both swing and javascript are mentioned, is this project a
>> web or
>> > standalone application?
>> >    (ii) As ruby-graphviz is included, Is ruby required for this project?
>>
>> I envision two visualize components in Pig. One is a lightweight
>> visualizer invoked by Grunt, which should be fast and concise, and
>> integrated into explain command. The other is a standalone composer
>> similar to PigPen, which should be much powerful. PIG-2586 is intended
>> to track the first, but Russel's comment is talking about the second.
>> Both are acceptable as a GSoC project. I leave it to Russel.
>>
>> >
>> > (2). Build system
>> > The code base of my research project is 40K loc and the build script was
>> > written in Ant. Part of my duty is to convert the ant build script to
>> maven
>> > and maintain the build script. Therefore, Mavenize Pig is of interest
>> to me
>> > too. The build.xml in PIG project is more complicated than the one I
>> worked
>> > before. It includes ant, maven and ivy. Do we need to use maven to do
>> all
>> > the tasks and get rid of all the dependency on ant, maven and ivy?
>>
>> Yes
>>
>> >
>> >  Best regards
>> >  Shasha(Amy) Liu
>>
>
>
>

-- 
Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com

Re: [GSoC 2012] Self Introduction and interested projects

Reply via email to