At roughly how many column families would this change show performance boost ?
Cheers > On Aug 27, 2015, at 4:56 PM, Himanshu Verma <[email protected]> > wrote: > > Hi, > > I was looking at following method: > > public void doBulkLoad(Path hfofDir, final Admin admin, Table table, >> >> RegionLocator regionLocator) throws TableNotFoundException, >> IOException { > > > > We can optimize following part of this method: > > 353 ArrayList<String> familyNames = new >> ArrayList<String>(families.size()); >> >> 354 for (HColumnDescriptor family : families) { >> >> 355 familyNames.add(family.getNameAsString()); >> >> 356 } >> >> 357 ArrayList<String> unmatchedFamilies = new ArrayList<String>(); >> >> 358 Iterator<LoadQueueItem> queueIter = queue.iterator(); >> >> 359 while (queueIter.hasNext()) { >> >> 360 LoadQueueItem lqi = queueIter.next(); >> >> 361 String familyNameInHFile = Bytes.toString(lqi.family); >> >> 362 if (!familyNames.contains(familyNameInHFile)) { >> >> 363 ¦ unmatchedFamilies.add(familyNameInHFile); >> >> 364 } >> >> 365 } > > line 353 uses ArrayList data structure for familyNames and calls its > "contains" (line 362) method which is O(n). We can instead use HashSet, its > "contains" method is O(1). > > It should increase performance in cases having large number of column > families. > > This is my first time here, I can make this change if everything looks fine. > > Regards, > Himanshu Verma
