> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/build.xml, line 110
> > <https://reviews.apache.org/r/1184/diff/1/?file=26836#file26836line110>
> >
> >     Instead of checking it in lib, is it possible to pull this lib through 
> > ivy ?

I don't know if this is exported by any particular artifact. It was part of 
hadoop-core, but as of 0.20.203/CDH3, it is not present there. Back in hadoop 
trunk and 0.20.204 onwards, it's supposed to have made a comeback. For now, 
better to keep it here for now and then pull it out once the primary hadoop 
version we depend on has it, I think.


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/common/HCatConstants.java, lines 67-69
> > <https://reviews.apache.org/r/1184/diff/1/?file=26838#file26838line67>
> >
> >     Do we need these keys? Looks like you are propagating the token already 
> > acquired in the first job to the second one.

This is done to mimic our usage of the hive delegation token, because we need 
to refer to it to be able to cancel this.

It might be possible to experiment with pulling the credentials out of the 
jobcontext and after checking .getKind() to see if it's a MR token, and 
checking whether or not we set it as opposed to it having been set by oozie or 
other external clients, but this was the easy way of it. It's worth 
experimenting with a secure cluster to optimize this.


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/common/HCatConstants.java, line 78
> > <https://reviews.apache.org/r/1184/diff/1/?file=26838#file26838line78>
> >
> >     Will it be better to externalize this property? Otherwise one has to 
> > recompile hcatalog turn this off.

The current idea is that this check is not really necessary, and it has to be 
an architectural discussion before we decide to enable it again. I used a 
boolean setting and if-guards instead of outright commenting out the code or 
not having it there so as to reverse it later on. If we discuss this and decide 
to have this feature enabled or configurable, we can make it configurable.


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/common/HCatUtil.java, line 317
> > <https://reviews.apache.org/r/1184/diff/1/?file=26839#file26839line317>
> >
> >     Do we need all these methods ? Looks like you no longer need not to 
> > acquire token from JT.

I do need to acquire the token from the JobTracker and I do use it - it is 
being used even if not explicitly set - that's what the exporting of the tokens 
dir does.


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/har/HarOutputCommitterPostProcessor.java,
> >  line 41
> > <https://reviews.apache.org/r/1184/diff/1/?file=26840#file26840line41>
> >
> >     Commented code. Either remove it or uncomment it. I think these logging 
> > statements should be enabled.

(refer above to needing to pull logging out to a separate class, then we can 
stop instantiating static LOGs altogether)


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, line 
> > 91
> > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line91>
> >
> >     Looks like you are not using this map later on. We should probably 
> > remove it.

Agreed, the functionality was replaced by discoverPartitions(), since we needed 
more info than this provided. This I'll remove and update the patch.

(TODO:remove baseCommitters)


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/har/HarOutputCommitterPostProcessor.java,
> >  lines 135-145
> > <https://reviews.apache.org/r/1184/diff/1/?file=26840#file26840line135>
> >
> >     Commented code.

I figured it might be useful in the future if we need to debug whether or not 
the har is valid. This I can live without.

(TODO:remove commented code)


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/common/HCatUtil.java, lines 329-366
> > <https://reviews.apache.org/r/1184/diff/1/?file=26839#file26839line329>
> >
> >     Looks little unnatural to do logging this way. We should take advantage 
> > of the logging library to accomplish these tasks.

Agreed on that. :)

We should pull out a bunch of these log functions into a proper log class, and 
also start using log4j proper instead of of only when we debug. Not immediately 
in the scope of this patch, I'd suggest.


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, line 
> > 108
> > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line108>
> >
> >     This I think is because you don't have a list of base committers 
> > initially. But it may be possible to get that list from HCatRecordWriter at 
> > this point and then call commitTask() on them. Calling commitTask from 
> > close() looks awkward, if there is a way we should avoid that.

There isn't a really good way for HCatRecordWriter to communicate this to the 
HCatOutputCommitter. The current approach comes closest to what we consider the 
equivalent of the underlying committer's commitTask(), and the RecordWriter is 
a good place to put it, because that is very much in the task scope.


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, 
> > lines 132-137
> > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line132>
> >
> >     Instead of special casing for dynamic partitioning we should always 
> > call baseComitter.setupTask() here. In case of dynamic partitions, this 
> > will become noop from here and latter call it again from write() to do the 
> > real work.

Again, as above, there is no way for us to know the baseCommitter in 
HCatOutputCommitter in a dynamic partitioning usecase.


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, 
> > lines 179-185
> > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line179>
> >
> >     Looks like this isn't needed anymore.

This is still needed and used. Without getting the jobclient token, we cannot 
launch the har task.


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, line 
> > 143
> > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line143>
> >
> >     Can you explain the purpose of this boolean? Looks like it can be 
> > derived from existing variables.

It serves as multiple-call guard for discoverPartitions() which can be a costly 
call to make, and calling multiple times should be avoided if it can be helped.

One thing I will agree with - it's cleaner to put this callguard inside 
discoverPartitions and have it be in only one place rather than in other 
functions that call it. I can make that change.

(TODO: move partitionsDiscovered check to inside discoverPartitions() )


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java, line 
> > 641
> > <https://reviews.apache.org/r/1184/diff/1/?file=26845#file26845line641>
> >
> >     Why we depend on this variable? Shouldn't we always do fs.exists() 
> > check?

There are two issues here.

a) The first is needing to check before a move, and fs.rename will fail in 
cases where fs.exists would have alerted us. So yes, we do always do a check, 
and we're good.
b) The second is an ability to do a check for all the moves that we would do, 
to see if any of them would have failed, before we even begin trying to move. 
This is needed especially for the case where we har, because that will not have 
the luxury of falling back on the add_partitions() failing before we do the 
move, and so, if there are any existing partitions in the directory, then it's 
needed that we know before we start trying to copy and fail mid-way. The dryRun 
flag gives us that ability to pre-check before we begin moving.


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitterPostProcessor.java,
> >  line 32
> > <https://reviews.apache.org/r/1184/diff/1/?file=26846#file26846line32>
> >
> >     I wonder if we are too early in introducing this interface. There is 
> > only one impl which is always called (in case of dynamic partitions). I 
> > think we should introduce interfaces only if there is a known use case for 
> > it. Further when other use cases do come up, good likelihood that we need 
> > to tweak the interface to accomodate them.

This I agree with, and that's why I didn't attempt to make the usage so generic 
as to allow other PostProcessors beyond har. The reason it's there at all is 
because I wanted to keep the har code pluggable if we wanted to modify later. 
I'm okay with removing the interface and just using the HarPostProcessor itself 
as is without the @Overrides

(TODO: discuss and potentially remove)


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/pig/HCatStorer.java, lines 112-118
> > <https://reviews.apache.org/r/1184/diff/1/?file=26853#file26853line112>
> >
> >     Looks like some of these are not required.

No, as in comments above, we still do use it.


> On 2011-07-22 18:01:31, Ashutosh Chauhan wrote:
> > trunk/src/java/org/apache/hcatalog/pig/PigHCatUtil.java, lines 409-420
> > <https://reviews.apache.org/r/1184/diff/1/?file=26854#file26854line409>
> >
> >     retrieve.. do you want to rename it to getFromUDF..

True. Unnecessary obfuscation. Sorry! :)

(TODO:rename)


- Sushanth


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1184/#review1167
-----------------------------------------------------------


On 2011-07-22 16:09:41, Ashutosh Chauhan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/1184/
> -----------------------------------------------------------
> 
> (Updated 2011-07-22 16:09:41)
> 
> 
> Review request for hcatalog.
> 
> 
> Summary
> -------
> 
> HCatalog-42 review request on behalf of Sushanth
> 
> 
> This addresses bug HCATALOG-42.
>     https://issues.apache.org/jira/browse/HCATALOG-42
> 
> 
> Diffs
> -----
> 
>   trunk/build.xml 1149353 
>   trunk/src/java/org/apache/hcatalog/common/ErrorType.java 1149353 
>   trunk/src/java/org/apache/hcatalog/common/HCatConstants.java 1149353 
>   trunk/src/java/org/apache/hcatalog/common/HCatUtil.java 1149353 
>   trunk/src/java/org/apache/hcatalog/har/HarOutputCommitterPostProcessor.java 
> PRE-CREATION 
>   trunk/src/java/org/apache/hcatalog/mapreduce/HCatBaseOutputCommitter.java 
> 1149353 
>   trunk/src/java/org/apache/hcatalog/mapreduce/HCatBaseOutputFormat.java 
> 1149353 
>   trunk/src/java/org/apache/hcatalog/mapreduce/HCatEximOutputCommitter.java 
> 1149353 
>   trunk/src/java/org/apache/hcatalog/mapreduce/HCatEximOutputFormat.java 
> 1149353 
>   trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java 
> 1149353 
>   
> trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitterPostProcessor.java
>  PRE-CREATION 
>   trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java 1149353 
>   trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputStorageDriver.java 
> 1149353 
>   trunk/src/java/org/apache/hcatalog/mapreduce/HCatRecordWriter.java 1149353 
>   trunk/src/java/org/apache/hcatalog/mapreduce/HCatTableInfo.java 1149353 
>   trunk/src/java/org/apache/hcatalog/mapreduce/OutputJobInfo.java 1149353 
>   trunk/src/java/org/apache/hcatalog/pig/HCatEximStorer.java 1149353 
>   trunk/src/java/org/apache/hcatalog/pig/HCatStorer.java 1149353 
>   trunk/src/java/org/apache/hcatalog/pig/PigHCatUtil.java 1149353 
>   trunk/src/java/org/apache/hcatalog/rcfile/RCFileMapReduceOutputFormat.java 
> 1149353 
>   trunk/src/java/org/apache/hcatalog/rcfile/RCFileOutputDriver.java 1149353 
>   trunk/src/test/org/apache/hcatalog/mapreduce/HCatMapReduceTest.java 1149353 
>   
> trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatDynamicPartitioned.java 
> PRE-CREATION 
>   trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatEximInputFormat.java 
> 1149353 
>   trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatEximOutputFormat.java 
> 1149353 
>   trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatNonPartitioned.java 
> 1149353 
>   trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatOutputFormat.java 
> 1149353 
>   trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatPartitioned.java 
> 1149353 
>   trunk/src/test/org/apache/hcatalog/pig/TestHCatStorer.java 1149353 
> 
> Diff: https://reviews.apache.org/r/1184/diff
> 
> 
> Testing
> -------
> 
> Unit tests are included
> 
> 
> Thanks,
> 
> Ashutosh
> 
>

Reply via email to