Thanks again Tom, So, it seems that the OODT component that I am searching for is OODT Workflow. I need to investigate about how to use this component to implement a data delivery service, so I would like to ask you for documentation about it.
Until now, I have installed Apache OODT ( https://cwiki.apache.org/confluence/display/OODT/RADiX+Powered+By+OODT) and I have been playing around with the File Manager component, following this document: https://cwiki.apache.org/confluence/display/OODT/OODT+Filemgr+User+Guide However, I did not find a similar document for the OODT Workflow component. I have just seen these wiki pages: https://cwiki.apache.org/confluence/display/OODT/Workflow2+Quick+Start+Guide https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide I don't know the difference between Workflow1 and Workflow2, so I am not sure if these are the guides that I should follow. I have also found this tutorial: https://oodt.apache.org/site_docs/cas-workflow/user/basic.html But I think I would need something more to start to work with this component, so if you can point me to other tutorials or documentation I would be very grateful. Susana. 2017-02-17 16:50 GMT+01:00 Tom Barber <tom.bar...@meteorite.bi>: > Hi Susana, > > Aggregating this and the off list email, you could technically connect the > FM to the users storage but thats probably not the correct way to go about > it. > > OODT is a toolbox at the end of the day so you pick the parts that enhance > what you're already doing. One seems to certainly be the ingestion of data > and capturing of metadata which could certainly be executed by the File > Manager and as such OODT would then be the gateway to the ingested files. > Off the back of that you could then implement a workflow that would trigger > post ingestion or timed or whatever that would then figure out what to do > with your data. > > For example, process ingests new data -> triggers workflow -> workflow > looks at new data and looks up the metadata for the new files -> workflow > then fires up GridFTP client or whatever delivery mechanism you use to > deliver files to enduser > > of course in reality the workflow could be any number of steps and scale in > many different ways, but that is one very simple OODT workflow overview. > > Tom > > On Fri, Feb 17, 2017 at 10:19 AM, Susana Sanchez Exposito <s...@iaa.es> > wrote: > > > Thanks Tom, > > > > From your answer I guess that I can use the OODT component File Manager > to > > delivery large data products (from GBs to TBs) to users located remotely > > (i.e users that are globally distributed). > > > > I have still some doubts, let me add them between your lines: > > > > 2017-02-16 13:18 GMT+01:00 Tom Barber <magicaltr...@apache.org>: > > > > > Hi Susana > > > > > > Welcome to the OODT list, this is indeed the correct place to ask about > > > OODT related stuff. > > > > > > How you deliver data, I guess often depends on your requirements, but > > OODT > > > was certainly designed with that type of thing in mind. > > > > > > The file manager is very flexible in terms of storage and is a portal > > > allowing for the ingestion of data products to a file store, this could > > be > > > a folder on a disk, nfs mount or something else, a HDFS cluster, S3 or > > > something completely different. So the system will ingest data into the > > > > > > > Do you mean that I can connect the File manager with the users' file > > stores, so when the File Manager stores the data products, in the > practice, > > what it would be doing is to delivery the data products to the users? > > > > Given the users' file stores would be located remotely (possibly through > > high latency networks), I would worried about the performance of this > > option. > > > > In addition, with this option I would not be able to select/filter which > > data products are delivered to each user, based on the metadata of the > > products. > > > > > > > > > > > file manager either through an API call, a crawling service or > something > > > else. During this operation metadata from the ingested files is then > > > extracted, for example if this were an image, you could extract EXIF > > data, > > > GEO data etc and then store that in the catalogue alongside the > ingested > > > product. > > > > > > There is a basic UI for showing ingested products called Ops UI, but in > > > reality for deployment as a service there would be a web interface > > written > > > to integrate into whatever application or portal you are already using, > > > which would then allow users to search for products via metadata or > keys > > in > > > the ingested data. From that search users could then do a range of > things > > > depending on what your requirements are, the simplest being clicking a > > link > > > to download the product. But of course it could be triggering a > workflow, > > > copying the file somewhere else or whatever. > > > > > > Behind the File Manager is also the workflow manager, so another > scenario > > > might be to ingest files into the file manager, which in turn triggers > a > > > workflow which then distributes the ingest files to people > automatically, > > > or performs some post processing etc. > > > > > > > Ok. So, I would need to implement this workflow in such a way that 1) it > > selects/filters which data products will be delivered to each user and > 2) > > it sends the data products to the remote users, by means of efficient > tools > > for data movement (e.g. GridFTP) > > > > > > > > > > > > Let us know if you have any further questions. > > > > > > > Thanks again! > > > > Susana. > > > > > > > > > > > > Tom > > > > > > On Thu, Feb 16, 2017 at 7:56 AM, Susana Sanchez < > susanasan...@gmail.com> > > > wrote: > > > > > > > Dear all, > > > > > > > > I am trying to find out which of the components of Apache OODT is the > > > most > > > > suitable for delivering large data products to users located remotely > > > > (users distributed on a WAN network) > > > > > > > > I have read the CAS File Manager has the capability to archive a file > > to > > > a > > > > remote location, so it could be a candidate. However it seems, this > > > > component was not designed for this purpose, so it is not recommended > > for > > > > distributing data through a WAN network. Is that correct? > > > > > > > > I think the components that I am looking for are the Grid product > > > services > > > > (Product server/client, Profile server/client, Query server/client). > > Am I > > > > right? > > > > If not, I would like to ask you to provide some information about > which > > > > OODT components I need to distribute data products through > > international > > > > networks. > > > > > > > > I was not sure if this is the correct email list to send this kind of > > > > question. If not, sorry about that and it would be appreciate if you > > > could > > > > forward it to the appropriate email address. > > > > > > > > Thanks in advance, > > > > Susana. > > > > > > > > > > > > > > > -- > > Susana Sánchez Expósito > > > > Instituto de Astrofísica de Andalucía - CSIC > > Glorieta de la Astronomía, s/n. E-18008, Granada > > Tel:(+34) 958 121 311 / (+34) 958 230 635 > > Fax:(+34) 958 814 530 > > e-mail: s...@iaa.es > > > -- Susana Sánchez Expósito Instituto de Astrofísica de Andalucía - CSIC Glorieta de la Astronomía, s/n. E-18008, Granada Tel:(+34) 958 121 311 / (+34) 958 230 635 Fax:(+34) 958 814 530 e-mail: s...@iaa.es