​Hi Lewis,

Ok. I'll check it out. Thanks!​

​--
adi

On Thu, Feb 26, 2015 at 10:32 AM, Lewis John Mcgibbney <
[email protected]> wrote:

> Hi Adi,
> Please see DRAT for a flagship application which displays OODT
> https://github.com/chrismattmann/drat
>
> On Wed, Feb 25, 2015 at 9:54 PM, Aditya Dhulipala <[email protected]>
> wrote:
>
> > Hi professor,
> >
> > Thanks for your support!
> >
> > I saw that a very large portion of the code was written by you. I'm
> > guessing I will be interacting with you a lot on this (assuming my
> > application for GSoC goes through)
> >
> > I've posted a link to a first-version of the project proposal in the
> above
> > email. I'd like to get some feedback on it so that I can polish it.
> >
> > Currently, I've written about the general overview of the project and a
> > broad description of the tasks. I'm still trying to get comfortable with
> > the codebase and try to come up with schedule of work, milestones &
> > deliverables, etc. Can you have a look at the proposal and let me know
> what
> > you think about it?
> >
> > Also, can you give me some pointers on how to use OODT on some dataset. I
> > think I may have the employment dataset from 572 last semester. Can you
> > give me some ideas on how to use it with OODT just to get a sense of how
> > OODT works etc..
> >
> > Here's one I'm thinking of:-
> > I think OODT is useful for storing, indexing structured data - through
> the
> > use of metadata files (.met) and indexing based on this to answer
> queries.
> > For unstructured data Lucene/Solr is great tool. But the employment
> dataset
> > is not completely unstructured nor does it have consistent structure. Its
> > slightly structured in the sense that we can guess what fields are there
> in
> > records, and move forward from there, right?
> > So far the only use case I can think of, is using OODT to crawl the
> dataset
> > and push it into solr. Then query OODT through the cmd line (like in the
> > oodt wiki examples) i.e. using solr syntax of sql syntax.
> >
> > Is this a valid use case for OODT? I think people would rather just query
> > solr directly right? is there any reason for OODT to act as an
> in-between?
> >
> > Any more ideas, comments, suggestions?
> >
> > Thanks!
> >
> > --
> > Aditya
> >
> >
> > adi
> >
> > On Wed, Feb 25, 2015 at 7:19 PM, Chris Mattmann <
> [email protected]>
> > wrote:
> >
> > > This sounds fabulous. I will be keen to help.
> > >
> > > ------------------------
> > > Chris Mattmann
> > > [email protected]
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Lewis John Mcgibbney <[email protected]>
> > > Reply-To: <[email protected]>
> > > Date: Wednesday, February 25, 2015 at 1:54 PM
> > > To: Aditya Dhulipala <[email protected]>
> > > Cc: "[email protected]" <[email protected]>
> > > Subject: Re: GSoC 2015
> > >
> > > >I think that you should aim to implement it on all components and we
> > > >should
> > > >be looking to merge to code into OODT (branch) incrementally.
> > > >It is OK that you may not get every component ported to Avro RPC, what
> > is
> > > >impoirtant is that there is an optimistic but realistic GSoC put
> > forward.
> > > >That is what we are looking for.
> > > >Thank you
> > > >LEwis
> > > >
> > > >
> > > >On Wed, Feb 25, 2015 at 1:48 PM, Aditya Dhulipala <[email protected]>
> > > >wrote:
> > > >
> > > >> Hi Lewis,
> > > >>
> > > >> Thanks for your reply!
> > > >>
> > > >> Your responses have helped immensely when I'm stuck on something!
> > > >>
> > > >> In the proposal that I was preparing I had listed out all the
> > components
> > > >> that would require schema definitions and then when I checked the
> OODT
> > > >> patch 658, I realized that a lot of this was done for the Gora
> > project.
> > > >>But
> > > >> your email has clarified that I can use that as a starting point for
> > the
> > > >> Avro project. This is extremely useful
> > > >>
> > > >> And thanks for the rest of the info as well (about ensuring
> backwards
> > > >> compaitibility, testing, regression testing).. Now I have a much
> > better
> > > >> idea of formulating a proposal (and the project to-dos also).
> > > >>
> > > >> I'll will have it ready ASAP. I will post it to the group by end of
> > > >>today
> > > >> so that I can get more feedback on it
> > > >>
> > > >> I think I should at least be able to define Avro RPC implementations
> > for
> > > >> one of the components of OODT in the GSoC duration, right?
> > > >> Define the schema
> > > >> Implement the services
> > > >> Write unit tests
> > > >> Regression test against XML-RPC
> > > >>
> > > >> Hopefully I should implement it for more than one component, but I'm
> > > >>still
> > > >> no able to estimate the workload. I'll continue reading up on this
> > > >>
> > > >> I'll continue to work on the proposal and keep you updated.
> > > >>
> > > >> Thanks for all the help!.. I think if I start early, then I can
> spend
> > > >>the
> > > >> summer coding from the begining..
> > > >>
> > > >> Thanks!
> > > >>
> > > >> --
> > > >> Aditya
> > > >>
> > > >>
> > > >> adi
> > > >>
> > > >> On Wed, Feb 25, 2015 at 9:34 AM, Lewis John Mcgibbney <
> > > >> [email protected]> wrote:
> > > >>
> > > >>> Hi Adi,
> > > >>>
> > > >>> On Wed, Feb 25, 2015 at 12:34 AM, Aditya Dhulipala <
> [email protected]
> > >
> > > >>> wrote:
> > > >>>
> > > >>>> Hi Lewis,
> > > >>>>
> > > >>>> I was going through the path you posted earlier. OODT- 658
> > > >>>> https://issues.apache.org/jira/browse/OODT-658
> > > >>>>
> > > >>>> I think this is a substantial part of the project we're currently
> > > >>>> talking about (XML-RPC overhaul).
> > > >>>>
> > > >>>
> > > >>> Substantial may be a wee bit optimistic ;) But yes a significant
> > > >>>portion
> > > >>> of thinking in to the OODT data structures logic has been done. We
> DO
> > > >>>need
> > > >>> to implement Metadata in exactly the right way without loosing
> > existing
> > > >>> functionality so please begin to think about that.
> > > >>>
> > > >>>
> > > >>>
> > >
> https://github.com/apache/oodt/blob/trunk/metadata/src/main/java/org/apa
> > > >>>che/oodt/cas/metadata/Metadata.java
> > > >>>
> > > >>>
> > > >>>> My understanding is that this patch was implemented to make Apache
> > > >>>>Gora
> > > >>>> communicate with OODT, so that's why you've implemented the schema
> > > >>>> definitions for all the data structures used by OODT.
> > > >>>>
> > > >>>
> > > >>> Correct
> > > >>>
> > > >>>
> > > >>>> Gora generates some statically typed code from this schema
> > > >>>>
> > > >>>
> > > >>> Using the GoraCompiler
> > > >>> http://gora.apache.org/current/compiler.html,
> > > >>> invoked via CompilerCLI
> > > >>>
> > > >>>
> > > >>>
> > > >>>> and the next step is to implement OODT logic to store the data in
> > Gora
> > > >>>> (as opposed to MySQL or Solr)
> > > >>>>
> > > >>>
> > > >>> YES. This will tidy A LOT of the current configuration up. Will
> also
> > > >>>have
> > > >>> a unified and well documented manner for configuring the mappings
> and
> > > >>> datastore specific configuration. All of the Gora datastores are
> > > >>>documented
> > > >>> here
> > > >>> http://gora.apache.org/current/index.html
> > > >>> I've been hacking away on documentation for Gora for about a year
> so
> > it
> > > >>> is now relatively OK. I hope you find it useful.
> > > >>>
> > > >>>
> > > >>>>
> > > >>>> So from the viewpoint of the project we're talking about i.e.
> > > >>>>Replacing
> > > >>>> XML-RPC with Avro,
> > > >>>> the schema definition part is pretty much done (or almost done?
> Need
> > > >>>>to
> > > >>>> define it within OODT as well?).
> > > >>>>
> > > >>>
> > > >>> Note, that NONE of the Avro RPC logic is implemented. So it is
> > nowhere
> > > >>> nearly done ;) The core project definition is still to be addressed
> > > >>>and I
> > > >>> am nearly 100% sure that we will have some trciky issues to address
> > > >>> regarding 1) maintaining as close to backwards compatability as
> > > >>>possible 2)
> > > >>> documenting the entire Avro RPC communications within OODT, 3)
> > Hooking
> > > >>>up
> > > >>> all services, 4) Testing the new implementation, 5) regression
> > testing
> > > >>>it
> > > >>> against the existing XML-RPC layer, 6) setting a roadmap fro
> > > >>>deprecation
> > > >>> and eventual removal of the XML-RPC material
> > > >>>
> > > >>>
> > > >>>> The next step would be to define RPC logic for the client server
> > > >>>> communication within OODT itself i.e. within filemgr, workflowmgr
> > etc.
> > > >>>>
> > > >>>
> > > >>> Correct, this should make up the majority of your proposal OK.
> > > >>>
> > > >>>
> > > >>>>
> > > >>>> Am I correct in understanding this?
> > > >>>>
> > > >>>>
> > > >>>> Yes and thank you for joining the dots, it is nice to see a
> student
> > > >>> interpreting and investigating the problem this much prior to the
> > > >>>project
> > > >>> starting. I am really looking forward to this now.
> > > >>> Thanks
> > > >>> LEwis
> > > >>>
> > > >>
> > > >>
> > > >
> > > >
> > > >--
> > > >*Lewis*
> > >
> > >
> > >
> >
>
>
>
> --
> *Lewis*
>

Reply via email to