> .. if the provider we are bringing up also > provides the data store, we can just omit the data store for that benchmark > and use what we've already brought up. Does that answer your question, or > have I misunderstood?
Yes, and it is a perfect approach for the case, great idea. > Great point -- I neglected to include the DirectRunner in the plans here. > I'll add it to the doc and file a JIRA. Excellent. This work is super interesting so don’t hesitate to ask anything from us the rest of the community because I think there are many of us interested and we can give a hand if needed. On Thu, Mar 16, 2017 at 9:17 AM, Jason Kuster <jasonkus...@google.com.invalid> wrote: > Thanks Ismael for the comments! Replied inline. > > On Wed, Mar 15, 2017 at 8:18 AM, Ismaël Mejía <ieme...@gmail.com> wrote: > >> Excellent proposal, sorry to jump into this discussion so late, this >> was in my toread list for almost two weeks, and I finally got the time >> to read the document and I have two minor comments: >> >> I have the impression that the strict separation of Providers (the >> data-processing systems) and Resources (the concrete Data Stores) >> makes sense for the general case, but is lacking if what we want to >> test are things in the Hadoop ecosystem where the data stores commonly >> co-exist in the same group of machines with the data-processing >> systems (the Providers), e.g. HDFS, Hbase + YARN. This is important to >> correctly test that data locality works correctly for example. Have >> you considered such case? >> > > Definitely interesting to think about, and I don't think I added provisions > for this in the doc. My impression, though, is that since the providers and > the data stores are not coupled, if the provider we are bringing up also > provides the data store, we can just omit the data store for that benchmark > and use what we've already brought up. Does that answer your question, or > have I misunderstood? > >> >> Another thing I noticed is that in the list of runners supporting PKB >> the Direct Runner is not included, is there any particular reason for >> this? I think that even if performance is not the main goal of the >> direct runner it can be nice to have it there too to catch any >> performance regressions, or is it because it is already ready for it? >> what do you think? >> >> > Great point -- I neglected to include the DirectRunner in the plans here. > I'll add it to the doc and file a JIRA. > > >> Thanks, >> Ismaël >> >> On Thu, Mar 2, 2017 at 11:49 PM, Amit Sela <amitsel...@gmail.com> wrote: >> > Looks great, and I'll be sure to follow this. Ping me if I can assist in >> > any way! >> > >> > On Fri, Mar 3, 2017 at 12:09 AM Ahmet Altay <al...@google.com.invalid> >> > wrote: >> > >> >> Sounds great, thank you! >> >> >> >> On Thu, Mar 2, 2017 at 1:41 PM, Jason Kuster <jasonkus...@google.com >> >> .invalid >> >> > wrote: >> >> >> >> > D'oh, my bad Ahmet. I've opened BEAM-1610, which handles support for >> >> Python >> >> > in PKB against the Dataflow runner. Once the Fn API progresses some >> more >> >> we >> >> > can add some work items for the other runners too. Let's chat about >> this >> >> > more, maybe next week? >> >> > >> >> > On Thu, Mar 2, 2017 at 1:31 PM, Ahmet Altay <al...@google.com.invalid >> > >> >> > wrote: >> >> > >> >> > > Thank you Jason, this is great. >> >> > > >> >> > > Which one of these issues fall into the land of sdk-py? >> >> > > >> >> > > Ahmet >> >> > > >> >> > > On Thu, Mar 2, 2017 at 12:34 PM, Jason Kuster < >> >> > > jasonkus...@google.com.invalid> wrote: >> >> > > >> >> > > > Glad to hear the excitement. :) >> >> > > > >> >> > > > Filed BEAM-1595 - 1609 to track work items. Some of these fall >> under >> >> > > runner >> >> > > > components, please feel free to reach out to me if you have any >> >> > questions >> >> > > > about how to accomplish these. >> >> > > > >> >> > > > Best, >> >> > > > >> >> > > > Jason >> >> > > > >> >> > > > On Wed, Mar 1, 2017 at 5:50 AM, Aljoscha Krettek < >> >> aljos...@apache.org> >> >> > > > wrote: >> >> > > > >> >> > > > > Thanks for writing this and taking care of this, Jason! >> >> > > > > >> >> > > > > I'm afraid I also cannot add anything except that I'm excited to >> >> see >> >> > > some >> >> > > > > results from this. >> >> > > > > >> >> > > > > On Wed, 1 Mar 2017 at 03:28 Kenneth Knowles >> <k...@google.com.invalid >> >> > >> >> > > > > wrote: >> >> > > > > >> >> > > > > Just got a chance to look this over. I don't have anything to >> add, >> >> > but >> >> > > > I'm >> >> > > > > pretty excited to follow this project. Have the JIRAs been filed >> >> > since >> >> > > > you >> >> > > > > shared the doc? >> >> > > > > >> >> > > > > On Wed, Feb 22, 2017 at 10:38 AM, Jason Kuster < >> >> > > > > jasonkus...@google.com.invalid> wrote: >> >> > > > > >> >> > > > > > Hey all, just wanted to pop this up again for people -- if >> anyone >> >> > has >> >> > > > > > thoughts on performance testing please feel welcome to chime >> in. >> >> :) >> >> > > > > > >> >> > > > > > On Fri, Feb 17, 2017 at 4:03 PM, Jason Kuster < >> >> > > jasonkus...@google.com> >> >> > > > > > wrote: >> >> > > > > > >> >> > > > > > > Hi all, >> >> > > > > > > >> >> > > > > > > I've written up a doc on next steps for getting performance >> >> > testing >> >> > > > up >> >> > > > > > and >> >> > > > > > > running for Beam. I'd love to hear from people -- there's a >> >> fair >> >> > > > amount >> >> > > > > > of >> >> > > > > > > work encapsulated in here, but the end result is that we >> have a >> >> > > > > > performance >> >> > > > > > > testing system which we can use for benchmarking all >> aspects of >> >> > > Beam, >> >> > > > > > which >> >> > > > > > > would be really exciting. Looking forward to your thoughts. >> >> > > > > > > >> >> > > > > > > https://docs.google.com/document/d/ >> >> > 1PsjGPSN6FuorEEPrKEP3u3m16tyOz >> >> > > > > > > ph5FnL2DhaRDz0/edit?ts=58a78e73 >> >> > > > > > > >> >> > > > > > > Best, >> >> > > > > > > >> >> > > > > > > Jason >> >> > > > > > > >> >> > > > > > > -- >> >> > > > > > > ------- >> >> > > > > > > Jason Kuster >> >> > > > > > > Apache Beam / Google Cloud Dataflow >> >> > > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > >> >> > > > > > -- >> >> > > > > > ------- >> >> > > > > > Jason Kuster >> >> > > > > > Apache Beam / Google Cloud Dataflow >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > -- >> >> > > > ------- >> >> > > > Jason Kuster >> >> > > > Apache Beam / Google Cloud Dataflow >> >> > > > >> >> > > >> >> > >> >> > >> >> > >> >> > -- >> >> > ------- >> >> > Jason Kuster >> >> > Apache Beam / Google Cloud Dataflow >> >> > >> >> >> > > > > -- > ------- > Jason Kuster > Apache Beam / Google Cloud Dataflow