Also this presentation is useful to understand the OODT Workflow Manager:

https://www.slideshare.net/chrismattmann/wengines-workflows-and-2-years-of-advanced-data-processing-in-apache-oodt
 

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 4/3/17, 6:51 PM, "Mattmann, Chris A (3010)" <[email protected]> 
wrote:

    Hi Keith,
    
    Thanks for contacting us. Yes this is precisely the type of thing that
    OODT can help you with.
    
    As a start, I would recommend reading this guide that shows you
    how to use the algorithm wrapper, CAS-PGE. You can build a workflow
    of several of these wrappers to push out your production pipeline:
    
    https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example
    
    In addition to the above guide, I would start with installing OODT RADIX, 
the
    quick installer:
    
    https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT
    
    Once RADIX is installed, then edit your CAS-PGE algorithm wrappers and write
    some config files. Then test out your production pipeline. If you run into 
trouble
    with your CAS-PGE here’s an FAQ:
    
    
https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Help+and+Documentation
    
    If you want to understand more about how metadata flows in the system, you 
can check
    this out:
    
    
https://cwiki.apache.org/confluence/display/OODT/Understanding+the+flow+of+Metadata+during+PGE+based+Processing
    
    and this:
    
    
https://cwiki.apache.org/confluence/display/OODT/Understanding+CAS-PGE+Metadata+Precendence
    
    Finally there are two examples of full-up OODT pipelines/deployments. The 
first is DRAT, which does
    large scale code license analysis via OODT map reduce (there is a paper in 
the GitHub repo you can check out):
    
    http://github.com/chrismattmann/drat/
    
    The second is Big Translate, a large scale Map Reduce machine translation 
pipeline, is here:
    
    http://github.com/chrismattmann/bigtranslate/
    
    Cheers and if we can help more let us know.
    
    Cheers,
    Chris
    
    
    
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Chris Mattmann, Ph.D.
    Principal Data Scientist, Engineering Administrative Office (3010)
    Manager, NSF & Open Source Projects Formulation and Development Offices 
(8212)
    NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
    Office: 180-503E, Mailstop: 180-503
    Email: [email protected]
    WWW:  http://sunset.usc.edu/~mattmann/
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Director, Information Retrieval and Data Science Group (IRDS)
    Adjunct Associate Professor, Computer Science Department
    University of Southern California, Los Angeles, CA 90089 USA
    WWW: http://irds.usc.edu/
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     
    
    On 4/3/17, 6:36 PM, "Keith Bannister" <[email protected]> wrote:
    
        Hi,
        
        I'm trying to work out whether OODT is the right framework for me.
        
        I have a radio astronomy application. Data rate is roughly 12 TB/day. 
        Data format it a custom one with all sorts of metadata flying around 
        (including sky direction in lat/long coordinates).
        
        The raw data is pretty huge, and I can't store it on an OODT machine. 
        The big disk I have access to won't run OODT>
        
        Basically I want to:
        
        1. Save the metadata of the raw data into an index somewhere.
        2. Run some GPU codes over the raw data. The GPU code parameters should 
        be set based on the metadata.
        3. Save the GPU results in an archive, with even more metadata
        4. Copy the raw data to a remote disk with a long-running bbcp  task.
        5. Delete the raw data, but keep the GPU results and all the metadata
        
        I'm having trouble finding the right documentation the describes how I 
        can do this. Can you give me a top level page? (I've looked at the 
wiki, 
        but it's a bit tricky to work out where to start).
        
        K
        
        
        -- 
        KEITH BANNISTER | Principal Research Engineer
        CSIRO Astronomy and Space Science
        T +61 2 9372 4295
        E [email protected]
        
    
    

Reply via email to