Re: Griffin on Docker - modify Hive metastore Uris

Lionel Liu Thu, 03 May 2018 18:43:04 -0700

Hi Enrico,

If you've modified hive-site.xml, you need to update it in hdfs by this,
because we've set "spark.yarn.dist.files
hdfs:///home/spark_conf/hive-site.xml" in spark-defaults.conf:
hadoop fs -rm /home/spark_conf/hive-site.xml
hadoop fs -put $HIVE_HOME/conf/hive-site.xml /home/spark_conf/


Then, you need to restart the griffin service if you modified
application.properties
of griffin, then it will re-read the application.properties.
1. Get pid of griffin service:
ps -ef | grep "service.jar"
2. kill pid of griffin service:
kill -9 <pid>
3. start griffin service:
cd ~/service/
nohup java -jar service.jar > service.log &

After about 2 minutes, the service starts up, you can refresh UI.

Thanks,
Lionel

On Thu, May 3, 2018 at 7:00 PM, Enrico D'Urso <[email protected]> wrote:

> Hi,
>
> I think I fixed the S3 issue,
> Basically, I added the following line in
> /apache/hadoop-2.6.5/etc/hadoop/core-site.xml :
>
> <property>
>   <name>fs.s3.impl</name>
>   <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
> </property>
>
> Now, the NOT FOUND error is gone!
>
> I think this should be enough to start playing with Griffin using our real
> data.
> Fix me If I wrong, but I think I can create my own measure config file and
> then submit the job to Spark.
> My question here is:
> I modified the metastore URI in hive-site.xml (hive directory),
> hive-site.xml (Spark conf directory), and finally also in
> /root/service/config/application.properties
> but in the UI I still see the old data. Do I need to restart some griffin
> services to force it to re-read the above config files?
>
> Thanks,
>
> Enrico
>
> On 4/26/18, 9:41 AM, "Enrico D'Urso" <[email protected]> wrote:
>
>     Hi,
>
>     That is ok, no problem.
>     I will update you if I am able to fix the issue.
>
>     Thanks,
>
>     Enrico
>
>     On 4/26/18, 2:43 AM, "William Guo" <[email protected]> wrote:
>
>         hi Enrico,
>
>         Honestly, It is a little difficult for us to setup the environment
> on aws
>         for now, since we are using hdfs.
>         But we will figure out how to support aws and post the status here.
>
>         For now, we are busing with release license issue.
>         After we have released 0.2.0. we will create a task for AWS.
>
>         BTW,
>
>         I am not sure whether this 'application/xml' is right or not
>
>         '''
>         18/04/25 14:13:02 DEBUG http.wire:  << "Content-Type:
>         application/xml[\r][\n]"
>         '''
>
>         Thanks,
>         William
>
>
>
>
>         On Wed, Apr 25, 2018 at 10:23 PM, Enrico D'Urso <
> [email protected]> wrote:
>
>         > Hi guys,
>         >
>         > Thank you for your email.
>         > My company is pretty interested in using Griffin (and maybe
> contribute to
>         > the code), but being able to use it with S3 (Aws in general)
> instead of
>         > HDFS is a crucial point.
>         > Let me share my configuration with you, I hope this can help to
> trouble
>         > shoot the issue. I believe that in case it does not, we can
> organize a call
>         > where I can share my screen.
>         >
>         > Let’s start with core-site.xml in the following directory:
>         > root@griffin:/apache/spark/conf#
>         >
>         > So it is the one that Spark uses. Here the complete xml:
>         > https://paste.ofcode.org/cfZFkRcGPsPshhew6X6HPL
>         > However, the important item is:
>         >
>         > <property>
>         >     <name>hive.metastore.uris</name>
>         >     <value>thrift://shared-XXXXX-dance.us-west-2.hcom-sandbox-
>         > aws.aws.hcom:48869</value>
>         >     <description>Thrift URI for the remote metastore. Used by
> metastore
>         > client to connect to remote metastore.</description>
>         >   </property>
>         >
>         > which works fine as I can see DBs and tables when using
> spark-shell.
>         >
>         > The second file I modified is core-site.xml here:
>         > root@griffin:/apache/hadoop-2.6.5/etc/hadoop#
>         > Complete file is here: https://paste.ofcode.org/
> k3HZqb6gEDhJd8XM9Pv45u
>         > But the important point is:
>         > <property>
>         > <name>fs.s3.awsAccessKeyId</name>
>         > <value>XXXXX</value>
>         > </property>
>         > <property>
>         > <name>fs.s3.awsSecretAccessKey</name>
>         > <value>XXXXXX</value>
>         > </property>
>         >
>         > the values are masked, but I can confirm that the values are
> correct, as
>         > it is able to authenticate with AWS.
>         >
>         > Finally, this is the way I run Spark-shell:
>         > spark-shell --deploy-mode client --master yarn
>         > --packages=org.apache.hadoop:hadoop-aws:2.6.5,
>         > com.amazonaws:aws-java-sdk:1.7.4
>         > please note the packages flag, which downloads the required
> packages to
>         > connect with AWS.
>         >
>         > Once that the spark-shell is opened I have no problem in viewing
> the DBs:
>         > sqlContext.sql("show databases").collect().foreach(println(_))
>         > It works and the result is correct.
>         > Then when I try to select any table:
>         > sqlContext.sql("Select * from XX.YY").take(2)
>         >
>         > I get the error:
>         > Caused by: java.util.concurrent.ExecutionException:
>         > java.io.FileNotFoundException: File s3://bucketName/XX/YY/
> sentdate=2018-01-14
>         > does not exist.
>         >         at java.util.concurrent.FutureTask.report(FutureTask.
> java:122)
>         >         at java.util.concurrent.FutureTask.get(FutureTask.
> java:192)
>         >         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.
>         > generateSplitsInfo(OrcInputFormat.java:998)
>         >         ... 93 more
>         > Caused by: java.io.FileNotFoundException: File
> s3://hcom-data-prod-users/
>         > user_tech/email_testing/sentdate=2018-01-14 does not exist.
>         >         at org.apache.hadoop.fs.s3.S3FileSystem.listStatus(
>         > S3FileSystem.java:195)
>         >         at org.apache.hadoop.fs.FileSystem.listStatus(
>         > FileSystem.java:1485)
>         >         at org.apache.hadoop.fs.FileSystem.listStatus(
>         > FileSystem.java:1525)
>         >         at org.apache.hadoop.fs.FileSystem$4.<init>(
> FileSystem.java:1682)
>         >         at org.apache.hadoop.fs.FileSystem.listLocatedStatus(
>         > FileSystem.java:1681)
>         >         at org.apache.hadoop.fs.FileSystem.listLocatedStatus(
>         > FileSystem.java:1664)
>         >         at org.apache.hadoop.hive.shims.Hadoop23Shims.
> listLocatedStatus(
>         > Hadoop23Shims.java:667)
>         >         at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(
>         > AcidUtils.java:361)
>         >         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$
>         > FileGenerator.call(OrcInputFormat.java:634)
>         >         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$
>         > FileGenerator.call(OrcInputFormat.java:620)
>         >         at java.util.concurrent.FutureTask.run(FutureTask.
> java:266)
>         >         at java.util.concurrent.ThreadPoolExecutor.runWorker(
>         > ThreadPoolExecutor.java:1142)
>         >         at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>         > ThreadPoolExecutor.java:617)
>         >         at java.lang.Thread.run(Thread.java:748)
>         >
>         > in fact, enabling debug mode, I see the HTTP-header request and
> response:
>         >
>         >  18/04/25 14:13:02 DEBUG conn.DefaultClientConnection: Sending
> request:
>         > GET /XXX/YYY%2Fsentdate%3D2018-01-14 HTTP/1.1
>         > 18/04/25 14:13:02 DEBUG http.wire:  >> "GET
> /%2Fuser_tech%2Femail_testing%2Fsentdate%3D2018-01-14
>         > HTTP/1.1[\r][\n]"
>         > 18/04/25 14:13:02 DEBUG http.wire:  >> "Date: Wed, 25 Apr 2018
> 14:13:02
>         > GMT[\r][\n]"
>         > 18/04/25 14:13:02 DEBUG http.wire:  >> "Host:
> hcom-MASK-users.s3.amazonaws.
>         > com:443[\r][\n]"
>         > 18/04/25 14:13:02 DEBUG http.wire:  >> "Connection:
> Keep-Alive[\r][\n]"
>         > 18/04/25 14:13:02 DEBUG http.wire:  >> "User-Agent: JetS3t/0.9.3
>         > (Linux/4.9.81-35.56.amzn1.x86_64; amd64; en; JVM
> 1.8.0_131)[\r][\n]"
>         > 18/04/25 14:13:02 DEBUG http.wire:  >> "[\r][\n]"
>         > 18/04/25 14:13:02 DEBUG http.headers: >> GET
> /XXX/YYY%2Fsentdate%3D2018-01-14
>         > HTTP/1.1
>         > 18/04/25 14:13:02 DEBUG http.headers: >> Date: Wed, 25 Apr 2018
> 14:13:02
>         > GMT
>         > 18/04/25 14:13:02 DEBUG http.headers: >> Host:
> hcom-MASK-prod-users.s3.
>         > amazonaws.com:443
>         > 18/04/25 14:13:02 DEBUG http.headers: >> Connection: Keep-Alive
>         > 18/04/25 14:13:02 DEBUG http.headers: >> User-Agent: JetS3t/0.9.3
>         > (Linux/4.9.81-35.56.amzn1.x86_64; amd64; en; JVM 1.8.0_131)
>         > 18/04/25 14:13:02 DEBUG http.wire:  << "HTTP/1.1 404 Not
> Found[\r][\n]"
>         > 18/04/25 14:13:02 DEBUG http.wire:  << "Content-Type:
>         > application/xml[\r][\n]"
>         > 18/04/25 14:13:02 DEBUG http.wire:  << "Transfer-Encoding:
> chunked[\r][\n]"
>         > 18/04/25 14:13:02 DEBUG http.wire:  << "Date: Wed, 25 Apr 2018
> 14:13:01
>         > GMT[\r][\n]"
>         > 18/04/25 14:13:02 DEBUG http.wire:  << "Server: AmazonS3[\r][\n]"
>         > 18/04/25 14:13:02 DEBUG http.wire:  << "[\r][\n]"
>         >
>         > I hope it can help.
>         >
>         > Thanks,
>         >
>         > Enrico
>         >
>         >
>         > On 4/25/18, 2:32 AM, "William Guo" <[email protected]> wrote:
>         >
>         >     hi Enrico,
>         >
>         >     We don't know why aws response 404, could you share your log
> for us to
>         >     trouble shooting?
>         >     BTW, can we access your aws instance? that will help us find
> the issue.
>         >
>         >
>         >     Thanks,
>         >     William
>         >
>         >     On Wed, Apr 25, 2018 at 12:08 AM, Enrico D'Urso <
> [email protected]>
>         > wrote:
>         >
>         >     > Hi,
>         >     >
>         >     > Yes, we did all of those things.
>         >     > Spark has the correct Hive metastore URI set and also has
> the right
>         >     > credentials for S3 (where the data is actually stored).
>         >     > The main problem is that when trying to fetch data from
> any table/
>         > any DB
>         >     > we get a File not found exception:
>         >     >
>         >     > Caused by: java.util.concurrent.ExecutionException:
>         >     > java.io.FileNotFoundException: File
> s3://XXXXXX-common/XXXX_dm/
>         >     > XXXX_trip_details/ctp-20180423t221106.941z-58moytj7/
>         > bk_date=2016-12-13
>         >     > does not exist.
>         >     >
>         >     > I checked on s3 and it does exists, although there is an
> additional
>         > level
>         >     > after ‘bk_date=2016-12-13’ . The complete path is as
> follows:
>         >     > s3://XXXXXX-common/XXXX_dm/XXXX_trip_details/ctp-
>         >     > 20180423t221106.941z-58moytj7/bk_date=2016-12-13/xyz
>         >     >
>         >     > Anyone has tested the Docker image to work with S3 instead
> of HDFS?
>         >     >
>         >     >
>         >     > Thanks,
>         >     >
>         >     > Enrico
>         >     > From: Lionel Liu <[email protected]>
>         >     > Date: Friday, April 13, 2018 at 10:20 AM
>         >     > To: "[email protected]" <
>         > [email protected]>,
>         >     > Enrico D'Urso <[email protected]>
>         >     > Subject: Re: Griffin on Docker - modify Hive metastore Uris
>         >     >
>         >     > Hi Enrico,
>         >     >
>         >     >
>         >     > I think you need to copy hive-site.xml into spark config
> directory,
>         > or
>         >     > explicitly set hive-site.xml in spark-shell command line.
>         >     > Because spark shell creates its sqlContext when start up,
> after then,
>         >     > setConf will not work.
>         >     >
>         >     >
>         >     >
>         >     > Thanks,
>         >     > Lionel
>         >     >
>         >     > On Thu, Apr 12, 2018 at 6:04 PM, Enrico D'Urso <
> [email protected]
>         >     > <mailto:[email protected]>> wrote:
>         >     > Hi,
>         >     >
>         >     > After further investigation, I noticed that Spark is
> pointing to the
>         > east
>         >     > Aws region, by default.
>         >     > Any suggestion to force it to use us-west2?
>         >     >
>         >     > Thanks
>         >     >
>         >     > From: Enrico D'Urso <[email protected]<mailto:a-
> [email protected]
>         > >>
>         >     > Date: Wednesday, April 11, 2018 at 3:55 PM
>         >     > To: Lionel Liu <[email protected]<mailto:l
> [email protected]>>,
>         > "
>         >     > [email protected]<mailto:dev@griffin.
>         > incubator.apache.org>"
>         >     > <[email protected]<mailto:dev@griffin.
>         > incubator.apache.org
>         >     > >>
>         >     > Subject: Re: Griffin on Docker - modify Hive metastore Uris
>         >     >
>         >     > Hi Lionel,
>         >     >
>         >     > Thank you for your email.
>         >     >
>         >     > Right now, I am testing Spark cluster using the Spark-shell
>         > available on
>         >     > your Docker image. I just wanted to test it before running
> any
>         > ‘measure
>         >     > job’ to tackle any configuration issue.
>         >     > I start the shell as follows:
>         >     > spark-shell --deploy-mode client --master yarn
>         >     > --packages=org.apache.hadoop:hadoop-aws:2.6.5
>         >     >
>         >     > I am fetching Hadoop-aws:2.6.5 as 2.6.5 is the Hadoop
> version that is
>         >     > included in the Docker image.
>         >     > So far, so good, then I also set the right Hive metastore
> URI:
>         >     > sqlContext.setConf("hive.metastore.uris", metastoreURI)
>         >     >
>         >     > the problem arises when I try to fetch any table for
> instance:
>         >     > sqlContext.sql("Select * from hcom_data_prod_.testtable").
> take(2)
>         >     >
>         >     > the table does exist, but I get an error back saying that:
>         >     >
>         >     > Caused by: java.io.FileNotFoundException: File
> s3://hcom-xxXXXxx/yyy
>         >     > /testtable/sentdate=2017-10-13 does not exist.
>         >     >
>         >     > But it does exist, basically AWS is responding with 404
> http message.
>         >     > I think I would get the same error if I try to run any
> ‘measure
>         > job’, so I
>         >     > prefer to tackle this earlier.
>         >     >
>         >     > Are you aware of any S3 endpoint misconfiguration with old
> version of
>         >     > Hadoop-aws?
>         >     >
>         >     > Many thanks,
>         >     >
>         >     > Enrico
>         >     >
>         >     >
>         >     > From: Lionel Liu <[email protected]<mailto:l
> [email protected]>>
>         >     > Date: Wednesday, April 11, 2018 at 3:34 AM
>         >     > To: "[email protected]<mailto:dev@griffin.
>         >     > incubator.apache.org>" <[email protected]
> <mailto:
>         >     > [email protected]>>, Enrico D'Urso <
>         > [email protected]
>         >     > <mailto:[email protected]>>
>         >     > Subject: Re: Griffin on Docker - modify Hive metastore Uris
>         >     >
>         >     > Hi Enrico,
>         >     >
>         >     > Griffin service only need to get metadata from hive
> metastore
>         > service, it
>         >     > doesn't fetch hive table data actually.
>         >     > Griffin measure, which runs on spark cluster, needs to
> fetch hive
>         > table
>         >     > data, you need to pass the AWS credentials to it when
> submit. I
>         > recommend
>         >     > you try the shell-submit way to submit the measure module
> first.
>         >     >
>         >     >
>         >     >
>         >     > Thanks,
>         >     > Lionel
>         >     >
>         >     > On Tue, Apr 10, 2018 at 9:48 PM, Enrico D'Urso <
> [email protected]
>         >     > <mailto:[email protected]><mailto:[email protected]<
> mailto:a-
>         >     > [email protected]>>> wrote:
>         >     > Hi,
>         >     >
>         >     > I have just set up the Griffin Docker image and it seems
> to work ok,
>         > I am
>         >     > able to view the sample data that comes by default.
>         >     >
>         >     > Now, I would like to test a bit the metrics things against
> a subset
>         > of a
>         >     > table that I have in our Hive instance;
>         >     > In particular the configuration is as follows:
>         >     > - Hive Metastore on RDS (Mysql on Amazon)
>         >     > -Actual data on  Amazon S3
>         >     >
>         >     > The machine in which Docker is running has access to the
> metastore
>         > and
>         >     > also can potentially fetch data from S3.
>         >     >
>         >     > I connected into the Docker image and now I am checking the
>         > following file:
>         >     > /root/service/config/application.properties
>         >     >
>         >     > in which I see the hive.metastore.uris that I can
> potentially modify.
>         >     > I would also need to pass to Griffin the AWS credentials
> to be able
>         > to
>         >     > fetch data from S3.
>         >     >
>         >     > Anyone has experience on this?
>         >     >
>         >     > Thanks,
>         >     >
>         >     > E.
>         >     >
>         >     >
>         >
>         >
>         >
>
>
>
>
>

Re: Griffin on Docker - modify Hive metastore Uris

Reply via email to