> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/build.xml, line 110 > > <https://reviews.apache.org/r/1184/diff/1/?file=26836#file26836line110> > > > > Instead of checking it in lib, is it possible to pull this lib through > > ivy ?
I don't know if this is exported by any particular artifact. It was part of hadoop-core, but as of 0.20.203/CDH3, it is not present there. Back in hadoop trunk and 0.20.204 onwards, it's supposed to have made a comeback. For now, better to keep it here for now and then pull it out once the primary hadoop version we depend on has it, I think. > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/common/HCatConstants.java, lines 67-69 > > <https://reviews.apache.org/r/1184/diff/1/?file=26838#file26838line67> > > > > Do we need these keys? Looks like you are propagating the token already > > acquired in the first job to the second one. This is done to mimic our usage of the hive delegation token, because we need to refer to it to be able to cancel this. It might be possible to experiment with pulling the credentials out of the jobcontext and after checking .getKind() to see if it's a MR token, and checking whether or not we set it as opposed to it having been set by oozie or other external clients, but this was the easy way of it. It's worth experimenting with a secure cluster to optimize this. > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/common/HCatConstants.java, line 78 > > <https://reviews.apache.org/r/1184/diff/1/?file=26838#file26838line78> > > > > Will it be better to externalize this property? Otherwise one has to > > recompile hcatalog turn this off. The current idea is that this check is not really necessary, and it has to be an architectural discussion before we decide to enable it again. I used a boolean setting and if-guards instead of outright commenting out the code or not having it there so as to reverse it later on. If we discuss this and decide to have this feature enabled or configurable, we can make it configurable. > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/common/HCatUtil.java, line 317 > > <https://reviews.apache.org/r/1184/diff/1/?file=26839#file26839line317> > > > > Do we need all these methods ? Looks like you no longer need not to > > acquire token from JT. I do need to acquire the token from the JobTracker and I do use it - it is being used even if not explicitly set - that's what the exporting of the tokens dir does. > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/har/HarOutputCommitterPostProcessor.java, > > line 41 > > <https://reviews.apache.org/r/1184/diff/1/?file=26840#file26840line41> > > > > Commented code. Either remove it or uncomment it. I think these logging > > statements should be enabled. (refer above to needing to pull logging out to a separate class, then we can stop instantiating static LOGs altogether) > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, line > > 91 > > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line91> > > > > Looks like you are not using this map later on. We should probably > > remove it. Agreed, the functionality was replaced by discoverPartitions(), since we needed more info than this provided. This I'll remove and update the patch. (TODO:remove baseCommitters) > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/har/HarOutputCommitterPostProcessor.java, > > lines 135-145 > > <https://reviews.apache.org/r/1184/diff/1/?file=26840#file26840line135> > > > > Commented code. I figured it might be useful in the future if we need to debug whether or not the har is valid. This I can live without. (TODO:remove commented code) > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/common/HCatUtil.java, lines 329-366 > > <https://reviews.apache.org/r/1184/diff/1/?file=26839#file26839line329> > > > > Looks little unnatural to do logging this way. We should take advantage > > of the logging library to accomplish these tasks. Agreed on that. :) We should pull out a bunch of these log functions into a proper log class, and also start using log4j proper instead of of only when we debug. Not immediately in the scope of this patch, I'd suggest. > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, line > > 108 > > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line108> > > > > This I think is because you don't have a list of base committers > > initially. But it may be possible to get that list from HCatRecordWriter at > > this point and then call commitTask() on them. Calling commitTask from > > close() looks awkward, if there is a way we should avoid that. There isn't a really good way for HCatRecordWriter to communicate this to the HCatOutputCommitter. The current approach comes closest to what we consider the equivalent of the underlying committer's commitTask(), and the RecordWriter is a good place to put it, because that is very much in the task scope. > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, > > lines 132-137 > > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line132> > > > > Instead of special casing for dynamic partitioning we should always > > call baseComitter.setupTask() here. In case of dynamic partitions, this > > will become noop from here and latter call it again from write() to do the > > real work. Again, as above, there is no way for us to know the baseCommitter in HCatOutputCommitter in a dynamic partitioning usecase. > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, > > lines 179-185 > > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line179> > > > > Looks like this isn't needed anymore. This is still needed and used. Without getting the jobclient token, we cannot launch the har task. > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, line > > 143 > > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line143> > > > > Can you explain the purpose of this boolean? Looks like it can be > > derived from existing variables. It serves as multiple-call guard for discoverPartitions() which can be a costly call to make, and calling multiple times should be avoided if it can be helped. One thing I will agree with - it's cleaner to put this callguard inside discoverPartitions and have it be in only one place rather than in other functions that call it. I can make that change. (TODO: move partitionsDiscovered check to inside discoverPartitions() ) > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, line > > 641 > > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line641> > > > > Why we depend on this variable? Shouldn't we always do fs.exists() > > check? There are two issues here. a) The first is needing to check before a move, and fs.rename will fail in cases where fs.exists would have alerted us. So yes, we do always do a check, and we're good. b) The second is an ability to do a check for all the moves that we would do, to see if any of them would have failed, before we even begin trying to move. This is needed especially for the case where we har, because that will not have the luxury of falling back on the add_partitions() failing before we do the move, and so, if there are any existing partitions in the directory, then it's needed that we know before we start trying to copy and fail mid-way. The dryRun flag gives us that ability to pre-check before we begin moving. > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitterPostProcessor.java, > > line 32 > > <https://reviews.apache.org/r/1184/diff/1/?file=26846#file26846line32> > > > > I wonder if we are too early in introducing this interface. There is > > only one impl which is always called (in case of dynamic partitions). I > > think we should introduce interfaces only if there is a known use case for > > it. Further when other use cases do come up, good likelihood that we need > > to tweak the interface to accomodate them. This I agree with, and that's why I didn't attempt to make the usage so generic as to allow other PostProcessors beyond har. The reason it's there at all is because I wanted to keep the har code pluggable if we wanted to modify later. I'm okay with removing the interface and just using the HarPostProcessor itself as is without the @Overrides (TODO: discuss and potentially remove) > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/pig/HCatStorer.java, lines 112-118 > > <https://reviews.apache.org/r/1184/diff/1/?file=26853#file26853line112> > > > > Looks like some of these are not required. No, as in comments above, we still do use it. > On 2011-07-22 18:01:31, Ashutosh Chauhan wrote: > > trunk/src/java/org/apache/hcatalog/pig/PigHCatUtil.java, lines 409-420 > > <https://reviews.apache.org/r/1184/diff/1/?file=26854#file26854line409> > > > > retrieve.. do you want to rename it to getFromUDF.. True. Unnecessary obfuscation. Sorry! :) (TODO:rename) - Sushanth ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1184/#review1167 ----------------------------------------------------------- On 2011-07-22 16:09:41, Ashutosh Chauhan wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/1184/ > ----------------------------------------------------------- > > (Updated 2011-07-22 16:09:41) > > > Review request for hcatalog. > > > Summary > ------- > > HCatalog-42 review request on behalf of Sushanth > > > This addresses bug HCATALOG-42. > https://issues.apache.org/jira/browse/HCATALOG-42 > > > Diffs > ----- > > trunk/build.xml 1149353 > trunk/src/java/org/apache/hcatalog/common/ErrorType.java 1149353 > trunk/src/java/org/apache/hcatalog/common/HCatConstants.java 1149353 > trunk/src/java/org/apache/hcatalog/common/HCatUtil.java 1149353 > trunk/src/java/org/apache/hcatalog/har/HarOutputCommitterPostProcessor.java > PRE-CREATION > trunk/src/java/org/apache/hcatalog/mapreduce/HCatBaseOutputCommitter.java > 1149353 > trunk/src/java/org/apache/hcatalog/mapreduce/HCatBaseOutputFormat.java > 1149353 > trunk/src/java/org/apache/hcatalog/mapreduce/HCatEximOutputCommitter.java > 1149353 > trunk/src/java/org/apache/hcatalog/mapreduce/HCatEximOutputFormat.java > 1149353 > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java > 1149353 > > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitterPostProcessor.java > PRE-CREATION > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java 1149353 > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputStorageDriver.java > 1149353 > trunk/src/java/org/apache/hcatalog/mapreduce/HCatRecordWriter.java 1149353 > trunk/src/java/org/apache/hcatalog/mapreduce/HCatTableInfo.java 1149353 > trunk/src/java/org/apache/hcatalog/mapreduce/OutputJobInfo.java 1149353 > trunk/src/java/org/apache/hcatalog/pig/HCatEximStorer.java 1149353 > trunk/src/java/org/apache/hcatalog/pig/HCatStorer.java 1149353 > trunk/src/java/org/apache/hcatalog/pig/PigHCatUtil.java 1149353 > trunk/src/java/org/apache/hcatalog/rcfile/RCFileMapReduceOutputFormat.java > 1149353 > trunk/src/java/org/apache/hcatalog/rcfile/RCFileOutputDriver.java 1149353 > trunk/src/test/org/apache/hcatalog/mapreduce/HCatMapReduceTest.java 1149353 > > trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatDynamicPartitioned.java > PRE-CREATION > trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatEximInputFormat.java > 1149353 > trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatEximOutputFormat.java > 1149353 > trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatNonPartitioned.java > 1149353 > trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatOutputFormat.java > 1149353 > trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatPartitioned.java > 1149353 > trunk/src/test/org/apache/hcatalog/pig/TestHCatStorer.java 1149353 > > Diff: https://reviews.apache.org/r/1184/diff > > > Testing > ------- > > Unit tests are included > > > Thanks, > > Ashutosh > >
