Hi,

Yes, we did all of those things.
Spark has the correct Hive metastore URI set and also has the right credentials 
for S3 (where the data is actually stored).
The main problem is that when trying to fetch data from any table/ any DB we 
get a File not found exception:

Caused by: java.util.concurrent.ExecutionException: 
java.io.FileNotFoundException: File 
s3://XXXXXX-common/XXXX_dm/XXXX_trip_details/ctp-20180423t221106.941z-58moytj7/bk_date=2016-12-13
 does not exist.

I checked on s3 and it does exists, although there is an additional level after 
‘bk_date=2016-12-13’ . The complete path is as follows:
s3://XXXXXX-common/XXXX_dm/XXXX_trip_details/ctp-20180423t221106.941z-58moytj7/bk_date=2016-12-13/xyz

Anyone has tested the Docker image to work with S3 instead of HDFS?


Thanks,

Enrico
From: Lionel Liu <[email protected]>
Date: Friday, April 13, 2018 at 10:20 AM
To: "[email protected]" <[email protected]>, 
Enrico D'Urso <[email protected]>
Subject: Re: Griffin on Docker - modify Hive metastore Uris

Hi Enrico,


I think you need to copy hive-site.xml into spark config directory, or 
explicitly set hive-site.xml in spark-shell command line.
Because spark shell creates its sqlContext when start up, after then, setConf 
will not work.



Thanks,
Lionel

On Thu, Apr 12, 2018 at 6:04 PM, Enrico D'Urso 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

After further investigation, I noticed that Spark is pointing to the east Aws 
region, by default.
Any suggestion to force it to use us-west2?

Thanks

From: Enrico D'Urso <[email protected]<mailto:[email protected]>>
Date: Wednesday, April 11, 2018 at 3:55 PM
To: Lionel Liu <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Griffin on Docker - modify Hive metastore Uris

Hi Lionel,

Thank you for your email.

Right now, I am testing Spark cluster using the Spark-shell available on your 
Docker image. I just wanted to test it before running any ‘measure job’ to 
tackle any configuration issue.
I start the shell as follows:
spark-shell --deploy-mode client --master yarn 
--packages=org.apache.hadoop:hadoop-aws:2.6.5

I am fetching Hadoop-aws:2.6.5 as 2.6.5 is the Hadoop version that is included 
in the Docker image.
So far, so good, then I also set the right Hive metastore URI:
sqlContext.setConf("hive.metastore.uris", metastoreURI)

the problem arises when I try to fetch any table for instance:
sqlContext.sql("Select * from hcom_data_prod_.testtable").take(2)

the table does exist, but I get an error back saying that:

Caused by: java.io.FileNotFoundException: File s3://hcom-xxXXXxx/yyy 
/testtable/sentdate=2017-10-13 does not exist.

But it does exist, basically AWS is responding with 404 http message.
I think I would get the same error if I try to run any ‘measure job’, so I 
prefer to tackle this earlier.

Are you aware of any S3 endpoint misconfiguration with old version of 
Hadoop-aws?

Many thanks,

Enrico


From: Lionel Liu <[email protected]<mailto:[email protected]>>
Date: Wednesday, April 11, 2018 at 3:34 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>, 
Enrico D'Urso <[email protected]<mailto:[email protected]>>
Subject: Re: Griffin on Docker - modify Hive metastore Uris

Hi Enrico,

Griffin service only need to get metadata from hive metastore service, it 
doesn't fetch hive table data actually.
Griffin measure, which runs on spark cluster, needs to fetch hive table data, 
you need to pass the AWS credentials to it when submit. I recommend you try the 
shell-submit way to submit the measure module first.



Thanks,
Lionel

On Tue, Apr 10, 2018 at 9:48 PM, Enrico D'Urso 
<[email protected]<mailto:[email protected]><mailto:[email protected]<mailto:[email protected]>>>
 wrote:
Hi,

I have just set up the Griffin Docker image and it seems to work ok, I am able 
to view the sample data that comes by default.

Now, I would like to test a bit the metrics things against a subset of a table 
that I have in our Hive instance;
In particular the configuration is as follows:
- Hive Metastore on RDS (Mysql on Amazon)
-Actual data on  Amazon S3

The machine in which Docker is running has access to the metastore and also can 
potentially fetch data from S3.

I connected into the Docker image and now I am checking the following file:
/root/service/config/application.properties

in which I see the hive.metastore.uris that I can potentially modify.
I would also need to pass to Griffin the AWS credentials to be able to fetch 
data from S3.

Anyone has experience on this?

Thanks,

E.

Reply via email to