Yes this will change. Apache Beam has been working towards a general solution to make all IO connectors become modular[1]. This would allow you to read from an arbitrary number of sources chaining the output from one to the next.
1: https://beam.apache.org/blog/2017/08/16/splittable-do-fn.html On Mon, Oct 29, 2018 at 9:57 AM Chaim Turkel <[email protected]> wrote: > Both solutions mean that i cannot use the beam IO classes that will be > me the distribution, but i would have to get the data myself using a > ParDo method, is this something that will change in the future? i > understand that spark has a push down method that will pass the filter > to the next level of querys. > chaim > On Mon, Oct 22, 2018 at 4:02 PM Jeff Klukas <[email protected]> wrote: > > > > Chaim - If the full list of IDs is able to fit comfortably in memory and > the Mongo collection is small enough that you can read the whole > collection, you may want to fetch the IDs into a Java collection using the > BigQuery API directly, then turn them into a Beam PCollection using > Create.of(collection_of_ids). You could then use MongoDbIO.read() to read > the entire collection, but throw out rows based on the side input of IDs. > > > > If the list of IDs is particularly small, you could fetch the collection > into memory and parse that into a string filter that you pass to > MongoDbIO.read() to specify which documents to fetch, avoiding the need for > a side input. > > > > Otherwise, if it's a large number of IDs, you may need to use Beam's > BigQueryIO to create a PCollection for the IDs, and then pass that into a > ParDo with a custom DoFn that issues Mongo queries for a batch of IDs. I'm > not very familiar with Mongo APIs, but you'd need to give the DoFn a > connection to Mongo that's serializable. You could likely look at the > implementation of MongoDbIO for inspiration there. > > > > On Sun, Oct 21, 2018 at 5:18 AM Chaim Turkel <[email protected]> wrote: > >> > >> hi, > >> I have the following flow i need to implement. > >> From the bigquery i run a query and get a list of id's then i need to > >> load from mongo all the documents based on these id's and export them > >> as an xml file. > >> How do you suggest i go about doing this? > >> > >> chaim > >> > >> -- > >> > >> > >> Loans are funded by > >> FinWise Bank, a Utah-chartered bank located in Sandy, > >> Utah, member FDIC, Equal > >> Opportunity Lender. Merchant Cash Advances are > >> made by Behalf. For more > >> information on ECOA, click here > >> <https://www.behalf.com/legal/ecoa/>. For important information about > >> opening a new > >> account, review Patriot Act procedures here > >> <https://www.behalf.com/legal/patriot/>. > >> Visit Legal > >> <https://www.behalf.com/legal/> to > >> review our comprehensive program terms, > >> conditions, and disclosures. > > -- > > > Loans are funded by > FinWise Bank, a Utah-chartered bank located in Sandy, > Utah, member FDIC, Equal > Opportunity Lender. Merchant Cash Advances are > made by Behalf. For more > information on ECOA, click here > <https://www.behalf.com/legal/ecoa/>. For important information about > opening a new > account, review Patriot Act procedures here > <https://www.behalf.com/legal/patriot/>. > Visit Legal > <https://www.behalf.com/legal/> to > review our comprehensive program terms, > conditions, and disclosures. >
