Hi Enrico, If you've modified hive-site.xml, you need to update it in hdfs by this, because we've set "spark.yarn.dist.files hdfs:///home/spark_conf/hive-site.xml" in spark-defaults.conf: hadoop fs -rm /home/spark_conf/hive-site.xml hadoop fs -put $HIVE_HOME/conf/hive-site.xml /home/spark_conf/
Then, you need to restart the griffin service if you modified application.properties of griffin, then it will re-read the application.properties. 1. Get pid of griffin service: ps -ef | grep "service.jar" 2. kill pid of griffin service: kill -9 <pid> 3. start griffin service: cd ~/service/ nohup java -jar service.jar > service.log & After about 2 minutes, the service starts up, you can refresh UI. Thanks, Lionel On Thu, May 3, 2018 at 7:00 PM, Enrico D'Urso <[email protected]> wrote: > Hi, > > I think I fixed the S3 issue, > Basically, I added the following line in > /apache/hadoop-2.6.5/etc/hadoop/core-site.xml : > > <property> > <name>fs.s3.impl</name> > <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value> > </property> > > Now, the NOT FOUND error is gone! > > I think this should be enough to start playing with Griffin using our real > data. > Fix me If I wrong, but I think I can create my own measure config file and > then submit the job to Spark. > My question here is: > I modified the metastore URI in hive-site.xml (hive directory), > hive-site.xml (Spark conf directory), and finally also in > /root/service/config/application.properties > but in the UI I still see the old data. Do I need to restart some griffin > services to force it to re-read the above config files? > > Thanks, > > Enrico > > On 4/26/18, 9:41 AM, "Enrico D'Urso" <[email protected]> wrote: > > Hi, > > That is ok, no problem. > I will update you if I am able to fix the issue. > > Thanks, > > Enrico > > On 4/26/18, 2:43 AM, "William Guo" <[email protected]> wrote: > > hi Enrico, > > Honestly, It is a little difficult for us to setup the environment > on aws > for now, since we are using hdfs. > But we will figure out how to support aws and post the status here. > > For now, we are busing with release license issue. > After we have released 0.2.0. we will create a task for AWS. > > BTW, > > I am not sure whether this 'application/xml' is right or not > > ''' > 18/04/25 14:13:02 DEBUG http.wire: << "Content-Type: > application/xml[\r][\n]" > ''' > > Thanks, > William > > > > > On Wed, Apr 25, 2018 at 10:23 PM, Enrico D'Urso < > [email protected]> wrote: > > > Hi guys, > > > > Thank you for your email. > > My company is pretty interested in using Griffin (and maybe > contribute to > > the code), but being able to use it with S3 (Aws in general) > instead of > > HDFS is a crucial point. > > Let me share my configuration with you, I hope this can help to > trouble > > shoot the issue. I believe that in case it does not, we can > organize a call > > where I can share my screen. > > > > Let’s start with core-site.xml in the following directory: > > root@griffin:/apache/spark/conf# > > > > So it is the one that Spark uses. Here the complete xml: > > https://paste.ofcode.org/cfZFkRcGPsPshhew6X6HPL > > However, the important item is: > > > > <property> > > <name>hive.metastore.uris</name> > > <value>thrift://shared-XXXXX-dance.us-west-2.hcom-sandbox- > > aws.aws.hcom:48869</value> > > <description>Thrift URI for the remote metastore. Used by > metastore > > client to connect to remote metastore.</description> > > </property> > > > > which works fine as I can see DBs and tables when using > spark-shell. > > > > The second file I modified is core-site.xml here: > > root@griffin:/apache/hadoop-2.6.5/etc/hadoop# > > Complete file is here: https://paste.ofcode.org/ > k3HZqb6gEDhJd8XM9Pv45u > > But the important point is: > > <property> > > <name>fs.s3.awsAccessKeyId</name> > > <value>XXXXX</value> > > </property> > > <property> > > <name>fs.s3.awsSecretAccessKey</name> > > <value>XXXXXX</value> > > </property> > > > > the values are masked, but I can confirm that the values are > correct, as > > it is able to authenticate with AWS. > > > > Finally, this is the way I run Spark-shell: > > spark-shell --deploy-mode client --master yarn > > --packages=org.apache.hadoop:hadoop-aws:2.6.5, > > com.amazonaws:aws-java-sdk:1.7.4 > > please note the packages flag, which downloads the required > packages to > > connect with AWS. > > > > Once that the spark-shell is opened I have no problem in viewing > the DBs: > > sqlContext.sql("show databases").collect().foreach(println(_)) > > It works and the result is correct. > > Then when I try to select any table: > > sqlContext.sql("Select * from XX.YY").take(2) > > > > I get the error: > > Caused by: java.util.concurrent.ExecutionException: > > java.io.FileNotFoundException: File s3://bucketName/XX/YY/ > sentdate=2018-01-14 > > does not exist. > > at java.util.concurrent.FutureTask.report(FutureTask. > java:122) > > at java.util.concurrent.FutureTask.get(FutureTask. > java:192) > > at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat. > > generateSplitsInfo(OrcInputFormat.java:998) > > ... 93 more > > Caused by: java.io.FileNotFoundException: File > s3://hcom-data-prod-users/ > > user_tech/email_testing/sentdate=2018-01-14 does not exist. > > at org.apache.hadoop.fs.s3.S3FileSystem.listStatus( > > S3FileSystem.java:195) > > at org.apache.hadoop.fs.FileSystem.listStatus( > > FileSystem.java:1485) > > at org.apache.hadoop.fs.FileSystem.listStatus( > > FileSystem.java:1525) > > at org.apache.hadoop.fs.FileSystem$4.<init>( > FileSystem.java:1682) > > at org.apache.hadoop.fs.FileSystem.listLocatedStatus( > > FileSystem.java:1681) > > at org.apache.hadoop.fs.FileSystem.listLocatedStatus( > > FileSystem.java:1664) > > at org.apache.hadoop.hive.shims.Hadoop23Shims. > listLocatedStatus( > > Hadoop23Shims.java:667) > > at org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState( > > AcidUtils.java:361) > > at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ > > FileGenerator.call(OrcInputFormat.java:634) > > at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$ > > FileGenerator.call(OrcInputFormat.java:620) > > at java.util.concurrent.FutureTask.run(FutureTask. > java:266) > > at java.util.concurrent.ThreadPoolExecutor.runWorker( > > ThreadPoolExecutor.java:1142) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > > ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:748) > > > > in fact, enabling debug mode, I see the HTTP-header request and > response: > > > > 18/04/25 14:13:02 DEBUG conn.DefaultClientConnection: Sending > request: > > GET /XXX/YYY%2Fsentdate%3D2018-01-14 HTTP/1.1 > > 18/04/25 14:13:02 DEBUG http.wire: >> "GET > /%2Fuser_tech%2Femail_testing%2Fsentdate%3D2018-01-14 > > HTTP/1.1[\r][\n]" > > 18/04/25 14:13:02 DEBUG http.wire: >> "Date: Wed, 25 Apr 2018 > 14:13:02 > > GMT[\r][\n]" > > 18/04/25 14:13:02 DEBUG http.wire: >> "Host: > hcom-MASK-users.s3.amazonaws. > > com:443[\r][\n]" > > 18/04/25 14:13:02 DEBUG http.wire: >> "Connection: > Keep-Alive[\r][\n]" > > 18/04/25 14:13:02 DEBUG http.wire: >> "User-Agent: JetS3t/0.9.3 > > (Linux/4.9.81-35.56.amzn1.x86_64; amd64; en; JVM > 1.8.0_131)[\r][\n]" > > 18/04/25 14:13:02 DEBUG http.wire: >> "[\r][\n]" > > 18/04/25 14:13:02 DEBUG http.headers: >> GET > /XXX/YYY%2Fsentdate%3D2018-01-14 > > HTTP/1.1 > > 18/04/25 14:13:02 DEBUG http.headers: >> Date: Wed, 25 Apr 2018 > 14:13:02 > > GMT > > 18/04/25 14:13:02 DEBUG http.headers: >> Host: > hcom-MASK-prod-users.s3. > > amazonaws.com:443 > > 18/04/25 14:13:02 DEBUG http.headers: >> Connection: Keep-Alive > > 18/04/25 14:13:02 DEBUG http.headers: >> User-Agent: JetS3t/0.9.3 > > (Linux/4.9.81-35.56.amzn1.x86_64; amd64; en; JVM 1.8.0_131) > > 18/04/25 14:13:02 DEBUG http.wire: << "HTTP/1.1 404 Not > Found[\r][\n]" > > 18/04/25 14:13:02 DEBUG http.wire: << "Content-Type: > > application/xml[\r][\n]" > > 18/04/25 14:13:02 DEBUG http.wire: << "Transfer-Encoding: > chunked[\r][\n]" > > 18/04/25 14:13:02 DEBUG http.wire: << "Date: Wed, 25 Apr 2018 > 14:13:01 > > GMT[\r][\n]" > > 18/04/25 14:13:02 DEBUG http.wire: << "Server: AmazonS3[\r][\n]" > > 18/04/25 14:13:02 DEBUG http.wire: << "[\r][\n]" > > > > I hope it can help. > > > > Thanks, > > > > Enrico > > > > > > On 4/25/18, 2:32 AM, "William Guo" <[email protected]> wrote: > > > > hi Enrico, > > > > We don't know why aws response 404, could you share your log > for us to > > trouble shooting? > > BTW, can we access your aws instance? that will help us find > the issue. > > > > > > Thanks, > > William > > > > On Wed, Apr 25, 2018 at 12:08 AM, Enrico D'Urso < > [email protected]> > > wrote: > > > > > Hi, > > > > > > Yes, we did all of those things. > > > Spark has the correct Hive metastore URI set and also has > the right > > > credentials for S3 (where the data is actually stored). > > > The main problem is that when trying to fetch data from > any table/ > > any DB > > > we get a File not found exception: > > > > > > Caused by: java.util.concurrent.ExecutionException: > > > java.io.FileNotFoundException: File > s3://XXXXXX-common/XXXX_dm/ > > > XXXX_trip_details/ctp-20180423t221106.941z-58moytj7/ > > bk_date=2016-12-13 > > > does not exist. > > > > > > I checked on s3 and it does exists, although there is an > additional > > level > > > after ‘bk_date=2016-12-13’ . The complete path is as > follows: > > > s3://XXXXXX-common/XXXX_dm/XXXX_trip_details/ctp- > > > 20180423t221106.941z-58moytj7/bk_date=2016-12-13/xyz > > > > > > Anyone has tested the Docker image to work with S3 instead > of HDFS? > > > > > > > > > Thanks, > > > > > > Enrico > > > From: Lionel Liu <[email protected]> > > > Date: Friday, April 13, 2018 at 10:20 AM > > > To: "[email protected]" < > > [email protected]>, > > > Enrico D'Urso <[email protected]> > > > Subject: Re: Griffin on Docker - modify Hive metastore Uris > > > > > > Hi Enrico, > > > > > > > > > I think you need to copy hive-site.xml into spark config > directory, > > or > > > explicitly set hive-site.xml in spark-shell command line. > > > Because spark shell creates its sqlContext when start up, > after then, > > > setConf will not work. > > > > > > > > > > > > Thanks, > > > Lionel > > > > > > On Thu, Apr 12, 2018 at 6:04 PM, Enrico D'Urso < > [email protected] > > > <mailto:[email protected]>> wrote: > > > Hi, > > > > > > After further investigation, I noticed that Spark is > pointing to the > > east > > > Aws region, by default. > > > Any suggestion to force it to use us-west2? > > > > > > Thanks > > > > > > From: Enrico D'Urso <[email protected]<mailto:a- > [email protected] > > >> > > > Date: Wednesday, April 11, 2018 at 3:55 PM > > > To: Lionel Liu <[email protected]<mailto:l > [email protected]>>, > > " > > > [email protected]<mailto:dev@griffin. > > incubator.apache.org>" > > > <[email protected]<mailto:dev@griffin. > > incubator.apache.org > > > >> > > > Subject: Re: Griffin on Docker - modify Hive metastore Uris > > > > > > Hi Lionel, > > > > > > Thank you for your email. > > > > > > Right now, I am testing Spark cluster using the Spark-shell > > available on > > > your Docker image. I just wanted to test it before running > any > > ‘measure > > > job’ to tackle any configuration issue. > > > I start the shell as follows: > > > spark-shell --deploy-mode client --master yarn > > > --packages=org.apache.hadoop:hadoop-aws:2.6.5 > > > > > > I am fetching Hadoop-aws:2.6.5 as 2.6.5 is the Hadoop > version that is > > > included in the Docker image. > > > So far, so good, then I also set the right Hive metastore > URI: > > > sqlContext.setConf("hive.metastore.uris", metastoreURI) > > > > > > the problem arises when I try to fetch any table for > instance: > > > sqlContext.sql("Select * from hcom_data_prod_.testtable"). > take(2) > > > > > > the table does exist, but I get an error back saying that: > > > > > > Caused by: java.io.FileNotFoundException: File > s3://hcom-xxXXXxx/yyy > > > /testtable/sentdate=2017-10-13 does not exist. > > > > > > But it does exist, basically AWS is responding with 404 > http message. > > > I think I would get the same error if I try to run any > ‘measure > > job’, so I > > > prefer to tackle this earlier. > > > > > > Are you aware of any S3 endpoint misconfiguration with old > version of > > > Hadoop-aws? > > > > > > Many thanks, > > > > > > Enrico > > > > > > > > > From: Lionel Liu <[email protected]<mailto:l > [email protected]>> > > > Date: Wednesday, April 11, 2018 at 3:34 AM > > > To: "[email protected]<mailto:dev@griffin. > > > incubator.apache.org>" <[email protected] > <mailto: > > > [email protected]>>, Enrico D'Urso < > > [email protected] > > > <mailto:[email protected]>> > > > Subject: Re: Griffin on Docker - modify Hive metastore Uris > > > > > > Hi Enrico, > > > > > > Griffin service only need to get metadata from hive > metastore > > service, it > > > doesn't fetch hive table data actually. > > > Griffin measure, which runs on spark cluster, needs to > fetch hive > > table > > > data, you need to pass the AWS credentials to it when > submit. I > > recommend > > > you try the shell-submit way to submit the measure module > first. > > > > > > > > > > > > Thanks, > > > Lionel > > > > > > On Tue, Apr 10, 2018 at 9:48 PM, Enrico D'Urso < > [email protected] > > > <mailto:[email protected]><mailto:[email protected]< > mailto:a- > > > [email protected]>>> wrote: > > > Hi, > > > > > > I have just set up the Griffin Docker image and it seems > to work ok, > > I am > > > able to view the sample data that comes by default. > > > > > > Now, I would like to test a bit the metrics things against > a subset > > of a > > > table that I have in our Hive instance; > > > In particular the configuration is as follows: > > > - Hive Metastore on RDS (Mysql on Amazon) > > > -Actual data on Amazon S3 > > > > > > The machine in which Docker is running has access to the > metastore > > and > > > also can potentially fetch data from S3. > > > > > > I connected into the Docker image and now I am checking the > > following file: > > > /root/service/config/application.properties > > > > > > in which I see the hive.metastore.uris that I can > potentially modify. > > > I would also need to pass to Griffin the AWS credentials > to be able > > to > > > fetch data from S3. > > > > > > Anyone has experience on this? > > > > > > Thanks, > > > > > > E. > > > > > > > > > > > > > > > > >
