Hi Russell, Based on the email discussions, I wrote my proposal of this pig visualizer project and submit it onto google-melange. Please take a look at it at your convenience, and it would also appreciated a lot if further feedback/comments could be provided.
Thank you very much. Best, On Sun, Mar 25, 2012 at 9:25 PM, Russell Jurney <russell.jur...@gmail.com>wrote: > I suggest you create a simple, minimal web application that visualizes a > pig script file each time a url with the script filename is loaded. > > For instance, the process to use the tool might go like this: > > 1) Run pigvisualizer.(pl/py/rb) locally, at the start of your pig work > session > 2) Create a new pig script at /my/dif/filename.pig > 3) Open http://localhost:4567/pigviz/my/dir/filename.pig in a web browser > 4) See a javascript-based visualization of your pig script > 5) Reload this web page each time you want to see a new visualization OR > have to page try to reload the file periodically > > There are several sources of data: > > 1) Start a pig session, via grunt,PigServer or HCatalog, and use > ILLUSTRATE/EXPLAIN. An old example of doing this is available at > https://github.com/rjurney/Cloud-Stenography > 2) Use the explain or -dot commands from pig command line. In looking at > the dot output, the graph is not as helpful as I had thought :( > 3) Use the PigPen code to get ILLUSTRATE data for visualization > > The ideal situation is that you get the data plan via EXPLAIN, and sample > data via ILLUSTRATE, and combine them to produce an even better version of > figure 2 in the paper > http://infolab.stanford.edu/~olston/publications/sigmod09.pdf > > [image: Inline image 1] > > As to the presentation of the data in an interface, I suggest you AVOID > eclipse and the UI code to PigPen, as there is little utility in having > this visualization there. Not all Pig users use Eclipse, and there is > little utility in editing scripts in the diagrams. There is great utility > in visualizing, understanding and debugging this way, but not so much in > editing. > > On the other hand, anyone can edit Pig in their favorite tool and view > their pig graph in a simple web application on their localhost by directing > a web browser at it. This is why a simple, small web application seems > best. You can use ruby/sinatra or python/bottle/flask or perl/catalyst to > make a simple web app. Check out sigma.js for graph visualization: > http://sigmajs.org/examples.html or http://neyric.github.com/wireit/ for > something more fully featured. > > Perhaps the best plan is to fix ILLUSTRATE (see > http://wiki.apache.org/pig/ExampleGenerator and talk to the guys at > mortardata.com who have a patch for this), and edit the PigPen code to > remove the Eclipse dependencies and have it output simple JSON for a web > application to consume. It could write to a file, or you could create a > simple web service that publishes JSON for the current pig session. > > Once we have JSON of ILLUSTRATE... getting a web visualization is easy. I > can help, I've done it before in Cloud Stenography by parsing data in > Grunt. Which you could do, btw. Old Perl code is available on github (see > above link). > > Interested in thoughts of others. > > On Fri, Mar 23, 2012 at 11:21 PM, Shasha Liu <grassons...@gmail.com>wrote: > >> Hi Daniel, >> >> Thanks a lot for the reply. >> I installed the latest Pig and read through the book of "programming in >> pig". >> I manged to use "-dot -out filename" to produce three graphs in dot file >> format. >> >> Based on the existing dot file, my next question is what is the >> requirement regarding a better visualizer? >> Are we going to generate a picture (e.g., .png) for different plans >> (logical plan, physical plan, map reduce plan), or provide a web interface >> to visualize these graphs of plans? >> >> Best regards, >> -- >> Shasha(Amy) Liu >> >> >> On Sun, Mar 18, 2012 at 3:30 AM, Daniel Dai <da...@hortonworks.com>wrote: >> >>> See comments inline. >>> >>> On Sat, Mar 17, 2012 at 6:52 AM, grassonsand <grassons...@gmail.com> >>> wrote: >>> > Dear all, >>> > >>> > I am a Ph.D. student in Computer Science and have 4-year Java >>> programming >>> > experience focusing on Java Web development. >>> > In the candidate projects in PIG, I am interested in PIG-2586 (A better >>> > plan/data flow visualizer) and PIG-2599 (Mavenize Pig). >>> > >>> > In my on-going research project, I am in charge of (1). web user >>> interface >>> > development and (2). build system. Now I am working on adding hadoop >>> > capability to the project. The main reason I am interested in the PIG >>> > project is that I can make a contribution to the PIG community based >>> on my >>> > previous experience, and learn from the participant in GSoC this year >>> and >>> > benefit my on-going research project at the same time. >>> > >>> > (1). User interface development >>> > I have used several graphic libraries to visualize semantic data and >>> our own >>> > data set, e.g., Jung, graphviz, BIRT, and several plot plugins in >>> jquery. >>> > Therefore, I am interested in working on a new tool for PIG visualizer. >>> > After looking through the bug issue, I have several questions: >>> > (i) As both swing and javascript are mentioned, is this project a >>> web or >>> > standalone application? >>> > (ii) As ruby-graphviz is included, Is ruby required for this >>> project? >>> >>> I envision two visualize components in Pig. One is a lightweight >>> visualizer invoked by Grunt, which should be fast and concise, and >>> integrated into explain command. The other is a standalone composer >>> similar to PigPen, which should be much powerful. PIG-2586 is intended >>> to track the first, but Russel's comment is talking about the second. >>> Both are acceptable as a GSoC project. I leave it to Russel. >>> >>> > >>> > (2). Build system >>> > The code base of my research project is 40K loc and the build script >>> was >>> > written in Ant. Part of my duty is to convert the ant build script to >>> maven >>> > and maintain the build script. Therefore, Mavenize Pig is of interest >>> to me >>> > too. The build.xml in PIG project is more complicated than the one I >>> worked >>> > before. It includes ant, maven and ivy. Do we need to use maven to do >>> all >>> > the tasks and get rid of all the dependency on ant, maven and ivy? >>> >>> Yes >>> >>> > >>> > Best regards >>> > Shasha(Amy) Liu >>> >> >> >> > > > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome. > com > -- Shasha(Amy) Liu