We'll open one KuduClient per accessed Table (for the getSplits) and one Kuduclient per accessed Tablets per Table.
In the MR world we would usually spawn one Mapper per inputsplit and therefore one separate KuduClient anyway, this change would not increase the number of open clients a lot. To be honest I think having the getSplits() and getRecordreader() called in one and the same container might be a special case anyway, I'll have to check that again tomorrow. Anyway, I'll submit a change request sometime this week hopefully. Cheers Clemens ________________________________ From: Jean-Daniel Cryans <[email protected]> Sent: 28 December 2017 00:04:53 To: [email protected] Subject: Re: Possible Race Condition in kudu-mapreduce Hey, No worries. TBH I haven't been in the MR world for a while, so if you think it makes more sense like that and that we won't be opening tons of KuduClients then I'm all for it. Thanks, J-D On Wed, Dec 27, 2017 at 2:47 PM, Clemens Valiente < [email protected]> wrote: > Hi J-D, > > thanks for the answer. I missed that part in the comment, teaches me to > check more closely in the future. Now I understand the reason. > > It might probably be a better idea to instantiate a separate KuduClient > for each RecordReader since they are not really designed to share the same > client anyway. > > Thoughts? > > > Clemens > > ________________________________ > From: Jean-Daniel Cryans <[email protected]> > Sent: 27 December 2017 23:37:54 > To: [email protected] > Subject: Re: Possible Race Condition in kudu-mapreduce > > Hi Clemens, > > The reason is here > https://github.com/apache/kudu/blob/master/java/kudu- > mapreduce/src/main/java/org/apache/kudu/mapreduce/ > KuduTableInputFormat.java#L72 > > I'd still be worried about that. You can look at what HBase does for > inspiration in fixing this: > https://github.com/apache/hbase/blob/master/hbase- > mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/ > TableInputFormatBase.java#L238 > > Thanks, > > J-D > > On Wed, Dec 27, 2017 at 6:01 AM, Clemens Valiente < > [email protected]> wrote: > > > Hi, > > > > I hope everyone had a nice long holiday weekend! > > > > > > I have a question regarding the kudu-mapreduce package, in particular > this > > line in the KuduTableInputFormat where getSplits() shuts down our Kudu > > client: > > > > https://github.com/cloudera/kudu/blob/master/java/kudu- > > mapreduce/src/main/java/org/apache/kudu/mapreduce/ > > KuduTableInputFormat.java#L164 > > > > Is there a reason for shutting down the client here? > > > > This does not work with Hive: > > > > In FetchInputFormatSplit, Hive uses the same InputFormat for fetching the > > splits and getting the recordReader (in our case, it is the > > KuduTableInputFormat.TableRecordReader). > > > > If Hive then tries to initialize that record reader, it runs into an > error > > here: > > > > https://github.com/cloudera/kudu/blob/master/java/kudu- > > mapreduce/src/main/java/org/apache/kudu/mapreduce/ > > KuduTableInputFormat.java#L397 > > > > since the TableRecordReader uses the same client of the > > KuduTableInputFormat that was already shut down by getSplits() > > > > > > Since the client is already shut down by the close() method on the > > KuduTableInputFormat, I don't see a need to do the same in getSplits() > > > > If there are no objections or a reason for keeping the shutdown call > > there, I would open a ticket and submit a patch for this. > > > > > > Cheers > > > > Clemens > > > > >
