Hi Keith,

Thanks for contacting us. Yes this is precisely the type of thing that
OODT can help you with.

As a start, I would recommend reading this guide that shows you
how to use the algorithm wrapper, CAS-PGE. You can build a workflow
of several of these wrappers to push out your production pipeline:

https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example

In addition to the above guide, I would start with installing OODT RADIX, the
quick installer:

https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT

Once RADIX is installed, then edit your CAS-PGE algorithm wrappers and write
some config files. Then test out your production pipeline. If you run into 
trouble
with your CAS-PGE here’s an FAQ:

https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Help+and+Documentation

If you want to understand more about how metadata flows in the system, you can 
check
this out:

https://cwiki.apache.org/confluence/display/OODT/Understanding+the+flow+of+Metadata+during+PGE+based+Processing

and this:

https://cwiki.apache.org/confluence/display/OODT/Understanding+CAS-PGE+Metadata+Precendence

Finally there are two examples of full-up OODT pipelines/deployments. The first 
is DRAT, which does
large scale code license analysis via OODT map reduce (there is a paper in the 
GitHub repo you can check out):

http://github.com/chrismattmann/drat/

The second is Big Translate, a large scale Map Reduce machine translation 
pipeline, is here:

http://github.com/chrismattmann/bigtranslate/

Cheers and if we can help more let us know.

Cheers,
Chris



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 4/3/17, 6:36 PM, "Keith Bannister" <[email protected]> wrote:

    Hi,
    
    I'm trying to work out whether OODT is the right framework for me.
    
    I have a radio astronomy application. Data rate is roughly 12 TB/day. 
    Data format it a custom one with all sorts of metadata flying around 
    (including sky direction in lat/long coordinates).
    
    The raw data is pretty huge, and I can't store it on an OODT machine. 
    The big disk I have access to won't run OODT>
    
    Basically I want to:
    
    1. Save the metadata of the raw data into an index somewhere.
    2. Run some GPU codes over the raw data. The GPU code parameters should 
    be set based on the metadata.
    3. Save the GPU results in an archive, with even more metadata
    4. Copy the raw data to a remote disk with a long-running bbcp  task.
    5. Delete the raw data, but keep the GPU results and all the metadata
    
    I'm having trouble finding the right documentation the describes how I 
    can do this. Can you give me a top level page? (I've looked at the wiki, 
    but it's a bit tricky to work out where to start).
    
    K
    
    
    -- 
    KEITH BANNISTER | Principal Research Engineer
    CSIRO Astronomy and Space Science
    T +61 2 9372 4295
    E [email protected]
    

Reply via email to