I'm trying to update Hadoop dependencies to the recent 3.3.0 and I've 
encountered a problem - the Hadoop related checks seem to work without any 
further changes but Hcatalog requires to be bumped to 3.x.y versions as well 
(2.x.y versions require Hadoop 2.x.y).

When I use Hadoop 3.3.0 there is a guava jar versions related exception [1] 
which I tried to solve by enforcing Guava 27.0-jre which is used by Hadoop 
3.3.0 - without success.

Then I used Hadoop 3.2.0 which doesn't have guava updated and Hive 3.1.2. I 
also replaced hive-site.xml with the recent default one from Hive's master. 
Then 4 tests from io/hcatalog are failing: 
testWriteThenReadSuccess - with exception [2]
testWriteThenUnboundedReadSuccess - with the same exception.

As far as I deduced it's a bit misleading because setOutput indeed is called in 
HCatalogIO.Write's  writerContext = masterWriter.prepareWrite() - which under 
the hood tries to call setOutput and fails.

The probable cause could be Hcatalog configuration. But I definitely lack 
knowledge how to set it up, especially the Hcatalog's  version 3.x 
documentation really doesn't help.

Do we have anyone with some knowledge about HCatalog that could help me with 
this?


[1] NoSuchMethodError: 
com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;J)V
[2] org.apache.beam.sdk.util.UserCodeException: 
org.apache.hive.hcatalog.common.HCatException : 2004 : HCatOutputFormat not 
initialized, setOutput has to be called. Cause : 
org.apache.hive.hcatalog.common.HCatException : 2001 : Error setting output 
information. Cause : java.lang.NullPointerException

Reply via email to