Hi Adi, Please see DRAT for a flagship application which displays OODT https://github.com/chrismattmann/drat
On Wed, Feb 25, 2015 at 9:54 PM, Aditya Dhulipala <[email protected]> wrote: > Hi professor, > > Thanks for your support! > > I saw that a very large portion of the code was written by you. I'm > guessing I will be interacting with you a lot on this (assuming my > application for GSoC goes through) > > I've posted a link to a first-version of the project proposal in the above > email. I'd like to get some feedback on it so that I can polish it. > > Currently, I've written about the general overview of the project and a > broad description of the tasks. I'm still trying to get comfortable with > the codebase and try to come up with schedule of work, milestones & > deliverables, etc. Can you have a look at the proposal and let me know what > you think about it? > > Also, can you give me some pointers on how to use OODT on some dataset. I > think I may have the employment dataset from 572 last semester. Can you > give me some ideas on how to use it with OODT just to get a sense of how > OODT works etc.. > > Here's one I'm thinking of:- > I think OODT is useful for storing, indexing structured data - through the > use of metadata files (.met) and indexing based on this to answer queries. > For unstructured data Lucene/Solr is great tool. But the employment dataset > is not completely unstructured nor does it have consistent structure. Its > slightly structured in the sense that we can guess what fields are there in > records, and move forward from there, right? > So far the only use case I can think of, is using OODT to crawl the dataset > and push it into solr. Then query OODT through the cmd line (like in the > oodt wiki examples) i.e. using solr syntax of sql syntax. > > Is this a valid use case for OODT? I think people would rather just query > solr directly right? is there any reason for OODT to act as an in-between? > > Any more ideas, comments, suggestions? > > Thanks! > > -- > Aditya > > > adi > > On Wed, Feb 25, 2015 at 7:19 PM, Chris Mattmann <[email protected]> > wrote: > > > This sounds fabulous. I will be keen to help. > > > > ------------------------ > > Chris Mattmann > > [email protected] > > > > > > > > > > -----Original Message----- > > From: Lewis John Mcgibbney <[email protected]> > > Reply-To: <[email protected]> > > Date: Wednesday, February 25, 2015 at 1:54 PM > > To: Aditya Dhulipala <[email protected]> > > Cc: "[email protected]" <[email protected]> > > Subject: Re: GSoC 2015 > > > > >I think that you should aim to implement it on all components and we > > >should > > >be looking to merge to code into OODT (branch) incrementally. > > >It is OK that you may not get every component ported to Avro RPC, what > is > > >impoirtant is that there is an optimistic but realistic GSoC put > forward. > > >That is what we are looking for. > > >Thank you > > >LEwis > > > > > > > > >On Wed, Feb 25, 2015 at 1:48 PM, Aditya Dhulipala <[email protected]> > > >wrote: > > > > > >> Hi Lewis, > > >> > > >> Thanks for your reply! > > >> > > >> Your responses have helped immensely when I'm stuck on something! > > >> > > >> In the proposal that I was preparing I had listed out all the > components > > >> that would require schema definitions and then when I checked the OODT > > >> patch 658, I realized that a lot of this was done for the Gora > project. > > >>But > > >> your email has clarified that I can use that as a starting point for > the > > >> Avro project. This is extremely useful > > >> > > >> And thanks for the rest of the info as well (about ensuring backwards > > >> compaitibility, testing, regression testing).. Now I have a much > better > > >> idea of formulating a proposal (and the project to-dos also). > > >> > > >> I'll will have it ready ASAP. I will post it to the group by end of > > >>today > > >> so that I can get more feedback on it > > >> > > >> I think I should at least be able to define Avro RPC implementations > for > > >> one of the components of OODT in the GSoC duration, right? > > >> Define the schema > > >> Implement the services > > >> Write unit tests > > >> Regression test against XML-RPC > > >> > > >> Hopefully I should implement it for more than one component, but I'm > > >>still > > >> no able to estimate the workload. I'll continue reading up on this > > >> > > >> I'll continue to work on the proposal and keep you updated. > > >> > > >> Thanks for all the help!.. I think if I start early, then I can spend > > >>the > > >> summer coding from the begining.. > > >> > > >> Thanks! > > >> > > >> -- > > >> Aditya > > >> > > >> > > >> adi > > >> > > >> On Wed, Feb 25, 2015 at 9:34 AM, Lewis John Mcgibbney < > > >> [email protected]> wrote: > > >> > > >>> Hi Adi, > > >>> > > >>> On Wed, Feb 25, 2015 at 12:34 AM, Aditya Dhulipala <[email protected] > > > > >>> wrote: > > >>> > > >>>> Hi Lewis, > > >>>> > > >>>> I was going through the path you posted earlier. OODT- 658 > > >>>> https://issues.apache.org/jira/browse/OODT-658 > > >>>> > > >>>> I think this is a substantial part of the project we're currently > > >>>> talking about (XML-RPC overhaul). > > >>>> > > >>> > > >>> Substantial may be a wee bit optimistic ;) But yes a significant > > >>>portion > > >>> of thinking in to the OODT data structures logic has been done. We DO > > >>>need > > >>> to implement Metadata in exactly the right way without loosing > existing > > >>> functionality so please begin to think about that. > > >>> > > >>> > > >>> > > https://github.com/apache/oodt/blob/trunk/metadata/src/main/java/org/apa > > >>>che/oodt/cas/metadata/Metadata.java > > >>> > > >>> > > >>>> My understanding is that this patch was implemented to make Apache > > >>>>Gora > > >>>> communicate with OODT, so that's why you've implemented the schema > > >>>> definitions for all the data structures used by OODT. > > >>>> > > >>> > > >>> Correct > > >>> > > >>> > > >>>> Gora generates some statically typed code from this schema > > >>>> > > >>> > > >>> Using the GoraCompiler > > >>> http://gora.apache.org/current/compiler.html, > > >>> invoked via CompilerCLI > > >>> > > >>> > > >>> > > >>>> and the next step is to implement OODT logic to store the data in > Gora > > >>>> (as opposed to MySQL or Solr) > > >>>> > > >>> > > >>> YES. This will tidy A LOT of the current configuration up. Will also > > >>>have > > >>> a unified and well documented manner for configuring the mappings and > > >>> datastore specific configuration. All of the Gora datastores are > > >>>documented > > >>> here > > >>> http://gora.apache.org/current/index.html > > >>> I've been hacking away on documentation for Gora for about a year so > it > > >>> is now relatively OK. I hope you find it useful. > > >>> > > >>> > > >>>> > > >>>> So from the viewpoint of the project we're talking about i.e. > > >>>>Replacing > > >>>> XML-RPC with Avro, > > >>>> the schema definition part is pretty much done (or almost done? Need > > >>>>to > > >>>> define it within OODT as well?). > > >>>> > > >>> > > >>> Note, that NONE of the Avro RPC logic is implemented. So it is > nowhere > > >>> nearly done ;) The core project definition is still to be addressed > > >>>and I > > >>> am nearly 100% sure that we will have some trciky issues to address > > >>> regarding 1) maintaining as close to backwards compatability as > > >>>possible 2) > > >>> documenting the entire Avro RPC communications within OODT, 3) > Hooking > > >>>up > > >>> all services, 4) Testing the new implementation, 5) regression > testing > > >>>it > > >>> against the existing XML-RPC layer, 6) setting a roadmap fro > > >>>deprecation > > >>> and eventual removal of the XML-RPC material > > >>> > > >>> > > >>>> The next step would be to define RPC logic for the client server > > >>>> communication within OODT itself i.e. within filemgr, workflowmgr > etc. > > >>>> > > >>> > > >>> Correct, this should make up the majority of your proposal OK. > > >>> > > >>> > > >>>> > > >>>> Am I correct in understanding this? > > >>>> > > >>>> > > >>>> Yes and thank you for joining the dots, it is nice to see a student > > >>> interpreting and investigating the problem this much prior to the > > >>>project > > >>> starting. I am really looking forward to this now. > > >>> Thanks > > >>> LEwis > > >>> > > >> > > >> > > > > > > > > >-- > > >*Lewis* > > > > > > > -- *Lewis*
