Hey, No worries.
TBH I haven't been in the MR world for a while, so if you think it makes more sense like that and that we won't be opening tons of KuduClients then I'm all for it. Thanks, J-D On Wed, Dec 27, 2017 at 2:47 PM, Clemens Valiente < [email protected]> wrote: > Hi J-D, > > thanks for the answer. I missed that part in the comment, teaches me to > check more closely in the future. Now I understand the reason. > > It might probably be a better idea to instantiate a separate KuduClient > for each RecordReader since they are not really designed to share the same > client anyway. > > Thoughts? > > > Clemens > > ________________________________ > From: Jean-Daniel Cryans <[email protected]> > Sent: 27 December 2017 23:37:54 > To: [email protected] > Subject: Re: Possible Race Condition in kudu-mapreduce > > Hi Clemens, > > The reason is here > https://github.com/apache/kudu/blob/master/java/kudu- > mapreduce/src/main/java/org/apache/kudu/mapreduce/ > KuduTableInputFormat.java#L72 > > I'd still be worried about that. You can look at what HBase does for > inspiration in fixing this: > https://github.com/apache/hbase/blob/master/hbase- > mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/ > TableInputFormatBase.java#L238 > > Thanks, > > J-D > > On Wed, Dec 27, 2017 at 6:01 AM, Clemens Valiente < > [email protected]> wrote: > > > Hi, > > > > I hope everyone had a nice long holiday weekend! > > > > > > I have a question regarding the kudu-mapreduce package, in particular > this > > line in the KuduTableInputFormat where getSplits() shuts down our Kudu > > client: > > > > https://github.com/cloudera/kudu/blob/master/java/kudu- > > mapreduce/src/main/java/org/apache/kudu/mapreduce/ > > KuduTableInputFormat.java#L164 > > > > Is there a reason for shutting down the client here? > > > > This does not work with Hive: > > > > In FetchInputFormatSplit, Hive uses the same InputFormat for fetching the > > splits and getting the recordReader (in our case, it is the > > KuduTableInputFormat.TableRecordReader). > > > > If Hive then tries to initialize that record reader, it runs into an > error > > here: > > > > https://github.com/cloudera/kudu/blob/master/java/kudu- > > mapreduce/src/main/java/org/apache/kudu/mapreduce/ > > KuduTableInputFormat.java#L397 > > > > since the TableRecordReader uses the same client of the > > KuduTableInputFormat that was already shut down by getSplits() > > > > > > Since the client is already shut down by the close() method on the > > KuduTableInputFormat, I don't see a need to do the same in getSplits() > > > > If there are no objections or a reason for keeping the shutdown call > > there, I would open a ticket and submit a patch for this. > > > > > > Cheers > > > > Clemens > > > > >
