Re: GSoC 2015

Aditya Dhulipala Wed, 25 Feb 2015 21:56:39 -0800

Hi professor,

Thanks for your support!


I saw that a very large portion of the code was written by you. I'm
guessing I will be interacting with you a lot on this (assuming my
application for GSoC goes through)

I've posted a link to a first-version of the project proposal in the above
email. I'd like to get some feedback on it so that I can polish it.

Currently, I've written about the general overview of the project and a
broad description of the tasks. I'm still trying to get comfortable with
the codebase and try to come up with schedule of work, milestones &
deliverables, etc. Can you have a look at the proposal and let me know what
you think about it?

Also, can you give me some pointers on how to use OODT on some dataset. I
think I may have the employment dataset from 572 last semester. Can you
give me some ideas on how to use it with OODT just to get a sense of how
OODT works etc..

Here's one I'm thinking of:-
I think OODT is useful for storing, indexing structured data - through the
use of metadata files (.met) and indexing based on this to answer queries.
For unstructured data Lucene/Solr is great tool. But the employment dataset
is not completely unstructured nor does it have consistent structure. Its
slightly structured in the sense that we can guess what fields are there in
records, and move forward from there, right?
So far the only use case I can think of, is using OODT to crawl the dataset
and push it into solr. Then query OODT through the cmd line (like in the
oodt wiki examples) i.e. using solr syntax of sql syntax.

Is this a valid use case for OODT? I think people would rather just query
solr directly right? is there any reason for OODT to act as an in-between?

Any more ideas, comments, suggestions?

Thanks!

--
Aditya


adi

On Wed, Feb 25, 2015 at 7:19 PM, Chris Mattmann <[email protected]>
wrote:

> This sounds fabulous. I will be keen to help.
>
> ------------------------
> Chris Mattmann
> [email protected]
>
>
>
>
> -----Original Message-----
> From: Lewis John Mcgibbney <[email protected]>
> Reply-To: <[email protected]>
> Date: Wednesday, February 25, 2015 at 1:54 PM
> To: Aditya Dhulipala <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Subject: Re: GSoC 2015
>
> >I think that you should aim to implement it on all components and we
> >should
> >be looking to merge to code into OODT (branch) incrementally.
> >It is OK that you may not get every component ported to Avro RPC, what is
> >impoirtant is that there is an optimistic but realistic GSoC put forward.
> >That is what we are looking for.
> >Thank you
> >LEwis
> >
> >
> >On Wed, Feb 25, 2015 at 1:48 PM, Aditya Dhulipala <[email protected]>
> >wrote:
> >
> >> Hi Lewis,
> >>
> >> Thanks for your reply!
> >>
> >> Your responses have helped immensely when I'm stuck on something!
> >>
> >> In the proposal that I was preparing I had listed out all the components
> >> that would require schema definitions and then when I checked the OODT
> >> patch 658, I realized that a lot of this was done for the Gora project.
> >>But
> >> your email has clarified that I can use that as a starting point for the
> >> Avro project. This is extremely useful
> >>
> >> And thanks for the rest of the info as well (about ensuring backwards
> >> compaitibility, testing, regression testing).. Now I have a much better
> >> idea of formulating a proposal (and the project to-dos also).
> >>
> >> I'll will have it ready ASAP. I will post it to the group by end of
> >>today
> >> so that I can get more feedback on it
> >>
> >> I think I should at least be able to define Avro RPC implementations for
> >> one of the components of OODT in the GSoC duration, right?
> >> Define the schema
> >> Implement the services
> >> Write unit tests
> >> Regression test against XML-RPC
> >>
> >> Hopefully I should implement it for more than one component, but I'm
> >>still
> >> no able to estimate the workload. I'll continue reading up on this
> >>
> >> I'll continue to work on the proposal and keep you updated.
> >>
> >> Thanks for all the help!.. I think if I start early, then I can spend
> >>the
> >> summer coding from the begining..
> >>
> >> Thanks!
> >>
> >> --
> >> Aditya
> >>
> >>
> >> adi
> >>
> >> On Wed, Feb 25, 2015 at 9:34 AM, Lewis John Mcgibbney <
> >> [email protected]> wrote:
> >>
> >>> Hi Adi,
> >>>
> >>> On Wed, Feb 25, 2015 at 12:34 AM, Aditya Dhulipala <[email protected]>
> >>> wrote:
> >>>
> >>>> Hi Lewis,
> >>>>
> >>>> I was going through the path you posted earlier. OODT- 658
> >>>> https://issues.apache.org/jira/browse/OODT-658
> >>>>
> >>>> I think this is a substantial part of the project we're currently
> >>>> talking about (XML-RPC overhaul).
> >>>>
> >>>
> >>> Substantial may be a wee bit optimistic ;) But yes a significant
> >>>portion
> >>> of thinking in to the OODT data structures logic has been done. We DO
> >>>need
> >>> to implement Metadata in exactly the right way without loosing existing
> >>> functionality so please begin to think about that.
> >>>
> >>>
> >>>
> https://github.com/apache/oodt/blob/trunk/metadata/src/main/java/org/apa
> >>>che/oodt/cas/metadata/Metadata.java
> >>>
> >>>
> >>>> My understanding is that this patch was implemented to make Apache
> >>>>Gora
> >>>> communicate with OODT, so that's why you've implemented the schema
> >>>> definitions for all the data structures used by OODT.
> >>>>
> >>>
> >>> Correct
> >>>
> >>>
> >>>> Gora generates some statically typed code from this schema
> >>>>
> >>>
> >>> Using the GoraCompiler
> >>> http://gora.apache.org/current/compiler.html,
> >>> invoked via CompilerCLI
> >>>
> >>>
> >>>
> >>>> and the next step is to implement OODT logic to store the data in Gora
> >>>> (as opposed to MySQL or Solr)
> >>>>
> >>>
> >>> YES. This will tidy A LOT of the current configuration up. Will also
> >>>have
> >>> a unified and well documented manner for configuring the mappings and
> >>> datastore specific configuration. All of the Gora datastores are
> >>>documented
> >>> here
> >>> http://gora.apache.org/current/index.html
> >>> I've been hacking away on documentation for Gora for about a year so it
> >>> is now relatively OK. I hope you find it useful.
> >>>
> >>>
> >>>>
> >>>> So from the viewpoint of the project we're talking about i.e.
> >>>>Replacing
> >>>> XML-RPC with Avro,
> >>>> the schema definition part is pretty much done (or almost done? Need
> >>>>to
> >>>> define it within OODT as well?).
> >>>>
> >>>
> >>> Note, that NONE of the Avro RPC logic is implemented. So it is nowhere
> >>> nearly done ;) The core project definition is still to be addressed
> >>>and I
> >>> am nearly 100% sure that we will have some trciky issues to address
> >>> regarding 1) maintaining as close to backwards compatability as
> >>>possible 2)
> >>> documenting the entire Avro RPC communications within OODT, 3) Hooking
> >>>up
> >>> all services, 4) Testing the new implementation, 5) regression testing
> >>>it
> >>> against the existing XML-RPC layer, 6) setting a roadmap fro
> >>>deprecation
> >>> and eventual removal of the XML-RPC material
> >>>
> >>>
> >>>> The next step would be to define RPC logic for the client server
> >>>> communication within OODT itself i.e. within filemgr, workflowmgr etc.
> >>>>
> >>>
> >>> Correct, this should make up the majority of your proposal OK.
> >>>
> >>>
> >>>>
> >>>> Am I correct in understanding this?
> >>>>
> >>>>
> >>>> Yes and thank you for joining the dots, it is nice to see a student
> >>> interpreting and investigating the problem this much prior to the
> >>>project
> >>> starting. I am really looking forward to this now.
> >>> Thanks
> >>> LEwis
> >>>
> >>
> >>
> >
> >
> >--
> >*Lewis*
>
>
>

Re: GSoC 2015

Reply via email to