Re: help: OODT component for distributing data through WAN

Tom Barber Fri, 17 Feb 2017 07:51:00 -0800

Hi Susana,

Aggregating this and the off list email, you could technically connect the
FM to the users storage but thats probably not the correct way to go about
it.


OODT is a toolbox at the end of the day so you pick the parts that enhance
what you're already doing. One seems to certainly be the ingestion of data
and capturing of metadata which could certainly be executed by the File
Manager and as such OODT would then be the gateway to the ingested files.
Off the back of that you could then implement a workflow that would trigger
post ingestion or timed or whatever that would then figure out what to do
with your data.

For example, process ingests new data -> triggers workflow -> workflow
looks at new data and looks up the metadata for the new files -> workflow
then fires up GridFTP client or whatever delivery mechanism you use to
deliver files to enduser

of course in reality the workflow could be any number of steps and scale in
many different ways, but that is one very simple OODT workflow overview.

Tom

On Fri, Feb 17, 2017 at 10:19 AM, Susana Sanchez Exposito <s...@iaa.es>
wrote:

> Thanks Tom,
>
> From your answer I guess that I can use the OODT component File Manager to
> delivery large data products (from GBs to TBs) to users located remotely
> (i.e users that are globally distributed).
>
> I have still some doubts, let me add them between your lines:
>
> 2017-02-16 13:18 GMT+01:00 Tom Barber <magicaltr...@apache.org>:
>
> > Hi Susana
> >
> > Welcome to the OODT list, this is indeed the correct place to ask about
> > OODT related stuff.
> >
> > How you deliver data, I guess often depends on your requirements, but
> OODT
> > was certainly designed with that type of thing in mind.
> >
> > The file manager is very flexible in terms of storage and is a portal
> > allowing for the ingestion of data products to a file store, this could
> be
> > a folder on a disk, nfs mount or something else, a HDFS cluster, S3 or
> > something completely different. So the system will ingest data into the
> >
>
> Do you mean that I can connect the File manager with the users' file
> stores, so when the File Manager stores the data products, in the practice,
> what it would be doing is to delivery the data products to the users?
>
> Given the users' file stores would be located remotely (possibly through
> high latency networks), I would worried about the performance of this
> option.
>
> In addition, with this option I would not be able to select/filter which
> data products are delivered to each user, based on the metadata of the
> products.
>
>
>
>
> > file manager either through an API call, a crawling service or something
> > else. During this operation metadata from the ingested files is then
> > extracted, for example if this were an image, you could extract EXIF
> data,
> > GEO data etc and then store that in the catalogue alongside the ingested
> > product.
> >
> > There is a basic UI for showing ingested products called Ops UI, but in
> > reality for deployment as a service there would be a web interface
> written
> > to integrate into whatever application or portal you are already using,
> > which would then allow users to search for products via metadata or keys
> in
> > the ingested data. From that search users could then do a range of things
> > depending on what your requirements are, the simplest being clicking a
> link
> > to download the product. But of course it could be triggering a workflow,
> > copying the file somewhere else or whatever.
> >
> > Behind the File Manager is also the workflow manager, so another scenario
> > might be to ingest files into the file manager, which in turn triggers a
> > workflow which then distributes the ingest files to people automatically,
> > or performs some post processing etc.
> >
>
> Ok. So, I would need to implement this workflow in such a way that 1) it
> selects/filters which data products will be delivered to each user  and 2)
> it sends the data products to the remote users, by means of efficient tools
> for data movement (e.g. GridFTP)
>
>
>
> >
> > Let us know if you have any further questions.
> >
>
> Thanks again!
>
> Susana.
>
>
>
> >
> > Tom
> >
> > On Thu, Feb 16, 2017 at 7:56 AM, Susana Sanchez <susanasan...@gmail.com>
> > wrote:
> >
> > > Dear all,
> > >
> > > I am trying to find out which of the components of Apache OODT is the
> > most
> > > suitable for delivering large data products to users located remotely
> > > (users distributed on a WAN network)
> > >
> > > I have read the CAS File Manager has the capability to archive a file
> to
> > a
> > > remote location, so it could be a candidate. However it seems, this
> > > component was not designed for this purpose, so it is not recommended
> for
> > > distributing data through a  WAN network. Is that correct?
> > >
> > > I think the components that I am looking for are the Grid product
> > services
> > > (Product server/client, Profile server/client, Query server/client).
> Am I
> > > right?
> > > If not, I would like to ask you to provide some information about which
> > > OODT components I need to distribute data products through
> international
> > > networks.
> > >
> > > I was not sure if this is the correct email list to send this kind of
> > > question. If not, sorry about that and it would be appreciate if you
> > could
> > > forward it to the appropriate email address.
> > >
> > > Thanks in advance,
> > > Susana.
> > >
> >
>
>
>
> --
> Susana Sánchez Expósito
>
> Instituto de Astrofísica de Andalucía - CSIC
> Glorieta de la Astronomía, s/n. E-18008, Granada
> Tel:(+34) 958 121 311 / (+34) 958 230 635
> Fax:(+34) 958 814 530
> e-mail: s...@iaa.es
>

Re: help: OODT component for distributing data through WAN

Reply via email to