That's great, I'll review it now! Russell Jurney http://datasyndrome.com
On Apr 6, 2012, at 7:03 AM, Shasha Liu <grassons...@gmail.com> wrote: > Hi Russell, > > Based on the email discussions, I wrote my proposal of this pig visualizer > project and submit it onto google-melange. Please take a look at it at your > convenience, and it would also appreciated a lot if further feedback/comments > could be provided. > > Thank you very much. > Best, > > On Sun, Mar 25, 2012 at 9:25 PM, Russell Jurney <russell.jur...@gmail.com> > wrote: > I suggest you create a simple, minimal web application that visualizes a pig > script file each time a url with the script filename is loaded. > > For instance, the process to use the tool might go like this: > > 1) Run pigvisualizer.(pl/py/rb) locally, at the start of your pig work session > 2) Create a new pig script at /my/dif/filename.pig > 3) Open http://localhost:4567/pigviz/my/dir/filename.pig in a web browser > 4) See a javascript-based visualization of your pig script > 5) Reload this web page each time you want to see a new visualization OR have > to page try to reload the file periodically > > There are several sources of data: > > 1) Start a pig session, via grunt,PigServer or HCatalog, and use > ILLUSTRATE/EXPLAIN. An old example of doing this is available at > https://github.com/rjurney/Cloud-Stenography > 2) Use the explain or -dot commands from pig command line. In looking at the > dot output, the graph is not as helpful as I had thought :( > 3) Use the PigPen code to get ILLUSTRATE data for visualization > > The ideal situation is that you get the data plan via EXPLAIN, and sample > data via ILLUSTRATE, and combine them to produce an even better version of > figure 2 in the paper > http://infolab.stanford.edu/~olston/publications/sigmod09.pdf > > <image.png> > > As to the presentation of the data in an interface, I suggest you AVOID > eclipse and the UI code to PigPen, as there is little utility in having this > visualization there. Not all Pig users use Eclipse, and there is little > utility in editing scripts in the diagrams. There is great utility in > visualizing, understanding and debugging this way, but not so much in editing. > > On the other hand, anyone can edit Pig in their favorite tool and view their > pig graph in a simple web application on their localhost by directing a web > browser at it. This is why a simple, small web application seems best. You > can use ruby/sinatra or python/bottle/flask or perl/catalyst to make a simple > web app. Check out sigma.js for graph visualization: > http://sigmajs.org/examples.html or http://neyric.github.com/wireit/ for > something more fully featured. > > Perhaps the best plan is to fix ILLUSTRATE (see > http://wiki.apache.org/pig/ExampleGenerator and talk to the guys at > mortardata.com who have a patch for this), and edit the PigPen code to remove > the Eclipse dependencies and have it output simple JSON for a web application > to consume. It could write to a file, or you could create a simple web > service that publishes JSON for the current pig session. > > Once we have JSON of ILLUSTRATE... getting a web visualization is easy. I > can help, I've done it before in Cloud Stenography by parsing data in Grunt. > Which you could do, btw. Old Perl code is available on github (see above > link). > > Interested in thoughts of others. > > On Fri, Mar 23, 2012 at 11:21 PM, Shasha Liu <grassons...@gmail.com> wrote: > Hi Daniel, > > Thanks a lot for the reply. > I installed the latest Pig and read through the book of "programming in pig". > I manged to use "-dot -out filename" to produce three graphs in dot file > format. > > Based on the existing dot file, my next question is what is the requirement > regarding a better visualizer? > Are we going to generate a picture (e.g., .png) for different plans (logical > plan, physical plan, map reduce plan), or provide a web interface to > visualize these graphs of plans? > > Best regards, > -- > Shasha(Amy) Liu > > > On Sun, Mar 18, 2012 at 3:30 AM, Daniel Dai <da...@hortonworks.com> wrote: > See comments inline. > > On Sat, Mar 17, 2012 at 6:52 AM, grassonsand <grassons...@gmail.com> wrote: > > Dear all, > > > > I am a Ph.D. student in Computer Science and have 4-year Java programming > > experience focusing on Java Web development. > > In the candidate projects in PIG, I am interested in PIG-2586 (A better > > plan/data flow visualizer) and PIG-2599 (Mavenize Pig). > > > > In my on-going research project, I am in charge of (1). web user interface > > development and (2). build system. Now I am working on adding hadoop > > capability to the project. The main reason I am interested in the PIG > > project is that I can make a contribution to the PIG community based on my > > previous experience, and learn from the participant in GSoC this year and > > benefit my on-going research project at the same time. > > > > (1). User interface development > > I have used several graphic libraries to visualize semantic data and our own > > data set, e.g., Jung, graphviz, BIRT, and several plot plugins in jquery. > > Therefore, I am interested in working on a new tool for PIG visualizer. > > After looking through the bug issue, I have several questions: > > (i) As both swing and javascript are mentioned, is this project a web or > > standalone application? > > (ii) As ruby-graphviz is included, Is ruby required for this project? > > I envision two visualize components in Pig. One is a lightweight > visualizer invoked by Grunt, which should be fast and concise, and > integrated into explain command. The other is a standalone composer > similar to PigPen, which should be much powerful. PIG-2586 is intended > to track the first, but Russel's comment is talking about the second. > Both are acceptable as a GSoC project. I leave it to Russel. > > > > > (2). Build system > > The code base of my research project is 40K loc and the build script was > > written in Ant. Part of my duty is to convert the ant build script to maven > > and maintain the build script. Therefore, Mavenize Pig is of interest to me > > too. The build.xml in PIG project is more complicated than the one I worked > > before. It includes ant, maven and ivy. Do we need to use maven to do all > > the tasks and get rid of all the dependency on ant, maven and ivy? > > Yes > > > > > Best regards > > Shasha(Amy) Liu > > > > > > -- > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com > > > > -- > Shasha(Amy) Liu