Re: Griffin Service Error

Lionel Liu Thu, 12 Apr 2018 02:49:15 -0700

Hi Karan,

Is your hive cluster based on kerberised cluster? I doubt that it was
caused by that.


How about this:
https://stackoverflow.com/questions/47533532/hivemetastoreclient-fails-to-connect-to-a-kerberized-cluster

Griffin uses HiveMetaStoreClient to connect Hive metastore service, you can
have a test of it directly, to solve this problem.

Thanks,
Lionel

On Thu, Apr 12, 2018 at 2:11 PM, Karan Gupta <[email protected]> wrote:

> Hi Lionel,
>
>
>
> Thank you for the reply.
>
>
>
> I did try to increase the hive.server2.thrift.max.worker.threads to 1500
> from the default 500 but it did not resolve the issue. Also we have 2
> instances of Hive Server 2 running on different machines.
>
>
>
> Could you recommend any other work around?
>
>
>
> Thank you,
>
> Karan Gupta
>
>
>
> *From:* Lionel Liu <[email protected]>
> *Sent:* Wednesday, April 11, 2018 10:34 AM
> *To:* [email protected]; Karan Gupta <
> [email protected]>
> *Subject:* Re: Griffin Service Error
>
>
>
> Hi Karan,
>
>
>
> I've read your log again, found error happens as steps bellow:
>
>
>
> *1. You've configured hive.metastore.uris as
> "thrift://azudpoc2928.ent.lolcentral.com:9083
> <https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fazudpoc2928.ent.lolcentral.com%3A9083&data=01%7C01%7Ckaran.gupta%40tavant.com%7C8864aa5f9b3c45b8737d08d59f699ca5%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=y42fj2YVC%2F0byf%2Bph8jX%2FTtiiS5%2FOaTKD2nAHNikyN4%3D&reserved=0>",
> which is the correct one.*
>
> Griffin server start up and try to connect to hive metastore service, some
> error occurs:
>
> 2018-04-05 09:32:20.842  WARN 106074 --- [           main] hive.metastore
>                          : set_ugi() not successful, Likely cause: new
> client talking to old server. Continuing without it.
>
> org.apache.thrift.transport.TTransportException:
> java.net.SocketException: Connection reset
>
>
>
> *But immediately, it succeed: *
>
> 2018-04-05 09:32:20.843  INFO 106074 --- [           main] hive.metastore
>                          : Connected to metastore.
>
>
>
> *2. Griffin service will cache the hive table metadata, and refresh it
> every 15 minutes.*
>
> The first refresh happens when start up:
>
> 2018-04-05 09:32:23.248  INFO 106074 --- [pool-4-thread-1] 
> o.a.g.c.m.hive.HiveMetaStoreService
>     : Evict hive cache
>
>
>
> *But it fails by this error:*
>
> 2018-04-05 09:32:23.260 ERROR 106074 --- [pool-4-thread-1] hive.log
>                          : Got exception: 
> org.apache.thrift.transport.TTransportException
> java.net.SocketException: Broken pipe (Write failed)
>
> org.apache.thrift.transport.TTransportException:
> java.net.SocketException: Broken pipe (Write failed)
>
>
>
> *Griffin service logs this error in the cache refresh process, now the
> cache is evicted but new data fetch fails:*
>
> 2018-04-05 09:32:23.263 ERROR 106074 --- [pool-4-thread-1] 
> o.a.g.c.m.hive.HiveMetaStoreService
>     : Can not get databases : Got exception: 
> org.apache.thrift.transport.TTransportException
> java.net.SocketException: Broken pipe (Write failed)
>
> 2018-04-05 09:32:23.263  INFO 106074 --- [pool-4-thread-1] 
> o.a.g.c.m.hive.HiveMetaStoreService
>     : After evict hive cache,automatically refresh hive tables cache.
>
>
>
> *3. Then griffin service will try to reconnect to hive metastore
> asynchronously, but every time it tries to connect, the same error occurs:*
>
> 2018-04-05 09:32:23.269  INFO 106074 --- [pool-3-thread-1] hive.metastore
>                          : Trying to connect to metastore with URI thrift://
> azudpoc2928.ent.lolcentral.com:9083
> <https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fazudpoc2928.ent.lolcentral.com%3A9083&data=01%7C01%7Ckaran.gupta%40tavant.com%7C8864aa5f9b3c45b8737d08d59f699ca5%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=y42fj2YVC%2F0byf%2Bph8jX%2FTtiiS5%2FOaTKD2nAHNikyN4%3D&reserved=0>
>
> 2018-04-05 09:32:23.279  WARN 106074 --- [pool-3-thread-1] hive.metastore
>                          : set_ugi() not successful, Likely cause: new
> client talking to old server. Continuing without it.
>
> org.apache.thrift.transport.TTransportException:
> java.net.SocketException: Connection reset
>
>
>
> But it also succeed to connect:
>
> 2018-04-05 09:32:23.280  INFO 106074 --- [pool-3-thread-1] hive.metastore
>                          : Connected to metastore.
>
>
>
> *4. And after 15 minutes, the same things happen again.*
>
>
>
>
>
> I think there are two problems we need to investigate:
>
>
>
> *1. Every time trying to connect hive metastore, error occurs but succeed
> immediately. *
>
> I've googled this error message and found this:
> https://community.hortonworks.com/questions/146939/
> extration-warn-hivemetastore-set-ugi-not-successfu.html
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommunity.hortonworks.com%2Fquestions%2F146939%2Fextration-warn-hivemetastore-set-ugi-not-successfu.html&data=01%7C01%7Ckaran.gupta%40tavant.com%7C8864aa5f9b3c45b8737d08d59f699ca5%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=ZECpEACGnlS6GmbsDmaqf2M%2BdliNPD1tJCB%2BCbDdj70%3D&reserved=0>
>
> Seems like too many client connections to hive metastore service.
>
>
>
> *2. Every time griffin evict cache and try to fetch new data using the
> built connection last time, it was broken pipe, seems like the connection
> lasts too short. *
>
> I wonder it could be solved by some configuration of hive metastore
> service, or it's also caused by too many connections.
>
>
>
> Could you check about this? Hope it helps.
>
>
>
>
>
> Thanks,
>
> Lionel
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Apr 10, 2018 at 8:57 PM, Karan Gupta <[email protected]>
> wrote:
>
> Hi Lionel,
>
>
>
> hive.metastore.uris is correctly set as per my knowledge in the
> application.properties. Could you suggest any alternative or a work around?
>
>
>
>
>
> Thank you,
>
> Karan Gupta
>
>
>
> *From:* Lionel, Liu <[email protected]>
> *Sent:* Tuesday, April 10, 2018 6:00 PM
> *To:* Karan Gupta <[email protected]>; [email protected].
> org
> *Subject:* RE: Griffin Service Error
>
>
>
> Hi Karan,
>
>
>
> It seems like connect hive metastore service fails, you need to configure
> "hive.metastore.uris” as the correct one in application.properties.
>
>
>
> Thanks
> Lionel, Liu
>
>
>
> *From: *Karan Gupta <[email protected]>
> *Sent: *2018年4月10日 17:59
> *To: *Lionel, Liu <[email protected]>
> *Subject: *Griffin Service Error
>
>
>
> Hi Lionel,
>
>
>
> I am encountering the below error when I try to run the griffin service jar
>
>
>
>
>
>
>
> Any guidance would be very helpful.
>
>
>
> Thank you,
>
> Karan Gupta
>
> *From:* Vinod Raina
> *Sent:* Monday, April 9, 2018 2:18 PM
> *To:* Lionel, Liu <[email protected]>
> *Cc:* Karan Gupta <[email protected]>
> *Subject:* RE: Few Questions about Griffin
>
>
>
> Thank you Lionel, this information helps J ..
>
>
>
>
>
>
>
> *Regards*
>
> *Vinod Raina* | [email protected]
>
> Associate Technical Architect
>
> M: +91 9711022965
>
>
>
> *From:* Lionel, Liu <[email protected]>
> *Sent:* Saturday, April 7, 2018 1:32 PM
> *To:* Vinod Raina <[email protected]>; [email protected].
> org
> *Cc:* Karan Gupta <[email protected]>
> *Subject:* RE: Few Questions about Griffin
>
>
>
> Hi Vinod,
>
>
>
> For the first question, it looks like the validity dimension, to measure
> the data item by the rules defined. The validity dimension has not been
> implemented in griffin, but you can also make it work by profiling at
> current. For example, you can define the profiling rule as “select
> count(*) from source where len(telephone) = 10 and name is not null”,
> that will produce the count of items matched such a rule, with another
> metric as total count, then you’ll get the percentage. In fact, getting
> the count metrics is better than getting the percentage directly.
>
> For the second question, I’m not very familiar with Kerberos, but in
> eBay, we’re also using hdfs cluster with Kerberos authentication. Griffin
> measure module works as a spark application, and it supports all the spark
> parameters, so it should work in the same way like you submit other spark
> applications on your cluster. If not correct pls tell me, thanks.
>
>
>
> Thanks
> Lionel, Liu
>
>
>
> *From: *Vinod Raina <[email protected]>
> *Sent: *2018年4月5日 13:09
> *To: *Lionel Liu <[email protected]>; [email protected]
> *Cc: *Karan Gupta <[email protected]>
> *Subject: *RE: Few Questions about Griffin
>
>
>
> Thank you Lionel,
>
> I have 2 more follow queries :
>
>    1. My requirement is to check the data quality in terms of whether the
>    data confirms to the data types that I expect it to be. E.g One column may
>    have telephone number, so I expect it to be 10 digit number , another
>    column is birthdate, so I expect it to be in a date format or there is a
>    name column and I don’t want it to be null/missing. So I need to create a
>    metric report where I can get to see the percentage of data that confirms
>    to the validations that we have created. Can griffin do that ?
>    2. Also, Our HDFS is a kerberised cluster. Can griffin work on a
>    kerberised cluster ?
>
>
>
>
>
>
>
> *Regards*
>
> *Vinod Raina* | [email protected]
>
> Associate Technical Architect
>
> M: +91 9711022965
>
>
>
> *From:* Lionel Liu <[email protected]>
> *Sent:* Tuesday, April 3, 2018 2:16 PM
> *To:* [email protected]; Vinod Raina <
> [email protected]>
> *Cc:* Karan Gupta <[email protected]>
> *Subject:* Re: Few Questions about Griffin
>
>
>
> Hi Vinod,
>
>
>
> We're glad to receive your email, there're some other documents of Griffin
> listed below:
>
> wiki: https://cwiki.apache.org/confluence/display/GRIFFIN/Apache+Griffin
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2FApache%2BGriffin&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=K1%2Be1%2F%2F3xdxV7Y9HMDwAeOS3Us6x1L2lGw6hD1WcdGg%3D&reserved=0>
>
> github: https://github.com/apache/incubator-griffin/tree/master/
> griffin-doc
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Ftree%2Fmaster%2Fgriffin-doc&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=XsJDny0l9frweakqLEMPMpTgtLCdJWBer59QcDaIi%2Bk%3D&reserved=0>
>
> And you can follow https://github.com/apache/
> incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fgriffin-doc%2Fdocker%2Fgriffin-docker-guide.md&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=gV%2FwnKgcBn3CaphB636zwFz5llJPMOuKmxlgQE0Oqf0%3D&reserved=0>
> to try griffin docker image.
>
>
>
> For your questions, I'll list my answers:
>
>
>
> *1. What is the usage of accuracy metric? In what situations, it will be
> useful?*
>
>
>
> Accuracy measures the match percentage between two data sources, we call
> them "target" and "source", "source" is the data source you trust, "target"
> is the data source you want to check.
>
> For example, say "source" is [1, 2, 3, 4, 5], while "target" is [1, 3, 5,
> 7, 9], we'll get the accuracy #(target items matched in source) / #(all
> target items) = 3/5 = 80%. Actually, "exactly match" is a narrow concept,
> in accuracy, we say "pass the match rule", users can define their own
> "match rule" like "source.age <= target.age AND upper(source.city) =
> upper(target.city)" instead of "exactly match".
>
> When we have a data source we trust, let it be the "source", then we can
> measure accuracy of another data source named "target", to figure out how
> correctly we can trust.
>
>
>
> There's a standard use case:
>
> In our data pipeline, when we get users' data from site, we persist it as
> table T1, which we trust it as the source of truth. On the other hand, a
> copy of users' data will be pushed to some streaming or batch processes,
> after some steps, the processed data is persisted as table T2, we want to
> know how correct it is, or how much we can trust it.
>
> Set T1 as "source", T2 as "target", we can get the accuracy of T2, with
> the wrong items from T2 persisted.
>
>
>
> And another specific use case:
>
> We have a streaming data process system, it consumes data from input and
> produces to output. In each output data item, it also contains the key of
> input item, we want to know how much data is successfully processed.
>
> Set output as "source", input as "target", we can get the accuracy of
> input, and the missing items from input will be persisted.
>
> Actually, this case measures the completeness of output, but it works like
> reversed accuracy, so we can use it like this.
>
>
>
> However, in griffin measure configuration, the concept of source and
> target are based on the code implementation, which is different from the
> business concept above. In the documents of measure configuration, we're
> measuring accuracy of "source".
>
> We are planning to modify the code implementation to be align with the
> business concept later, by then, we'll highlight it in the release notes.
>
>
>
>
>
> *2. Can we run other metrics using command-line? (or) Is only accuracy
> metric supported at the moment?*
>
>
>
> Yes, you can just run griffin measure module using cmd-line directly, like
> this: https://github.com/bhlx3lyx7/griffin-docker/blob/master/
> svc_msr_new/prep/measure/start-accu.sh
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Fsvc_msr_new%2Fprep%2Fmeasure%2Fstart-accu.sh&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=Tv0zkOEV3gy0sZXo3HJeZ6%2BG3qw1qGXEbAt1O0VAr1k%3D&reserved=0>
> .
>
> At current, griffin UI module doesn't support all the dimensions, but
> measure module supports accuracy, profiling, timeliness and uniqueness, you
> can get some description of them here: https://github.com/apache/
> incubator-griffin/blob/master/griffin-doc/measure/dsl-guide.
> md#griffin-dsl-translation-to-sql
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fgriffin-doc%2Fmeasure%2Fdsl-guide.md%23griffin-dsl-translation-to-sql&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=SK4UDSZabQmU4215b9WSPY75qm5fcf5Ed%2BbjJGjWwdQ%3D&reserved=0>
> .
>
>
>
>
>
> *3. Project roadmap for features?*
>
>
>
> The project roadmap is out of date, we've updated it:
> https://cwiki.apache.org/confluence/display/GRIFFIN/0.+Roadmap
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F0.%2BRoadmap&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=2W2bVULf8eJQboeUuV8%2BRNKyyP84%2BANEo1sCAoMrGlM%3D&reserved=0>
>
> Some new features we're planning in the short term planning:
>
> - streaming measure job schedule.
>
> - more data quality dimensions support, such as completeness, consistency,
> validity.
>
> And for long term, maybe including:
>
> - more data sources support, such as RDBs, elasticsearch.
>
> - anomaly detection support.
>
> - spark 2 support.
>
>
>
>
>
> *4. Can we use create custom Rules and profile existing data?*
>
>
>
> Yes, you can create custom rules for your data, according to the
> documents: https://github.com/apache/incubator-griffin/blob/master/
> griffin-doc/measure/measure-configuration-guide.md
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fgriffin-doc%2Fmeasure%2Fmeasure-configuration-guide.md&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=IWI0GSPaDRWkqJ3mj%2B%2FtvP7tGq0BnqJp8RUNeQt%2FnTg%3D&reserved=0>
> and https://github.com/apache/incubator-griffin/blob/master/
> griffin-doc/measure/measure-batch-sample.md
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fgriffin-doc%2Fmeasure%2Fmeasure-batch-sample.md&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=ZLXrMHoB2TGto5H2f9dnERtvenhxE3b1qnwiFQIi7UA%3D&reserved=0>
> .
>
> The profiling rule supports simple spark-sql syntax directly, as
> https://github.com/apache/incubator-griffin/blob/master/
> griffin-doc/measure/dsl-guide.md#profiling
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fgriffin-doc%2Fmeasure%2Fdsl-guide.md%23profiling&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=EUnidToEM3LsPbP%2Fi7UjQZMT1Hmi6HGVoEfPH7e1574%3D&reserved=0>
> described.
>
> If you want to use spark-sql, you can also define the rules like this:
> https://github.com/apache/incubator-griffin/blob/master/
> griffin-doc/measure/dsl-guide.md#spark-sql
> <https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fincubator-griffin%2Fblob%2Fmaster%2Fgriffin-doc%2Fmeasure%2Fdsl-guide.md%23spark-sql&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=HHhEqZn6e1bpzPV7BuSQ7CpaahKDXNZsbJBLiTU1cGc%3D&reserved=0>
> .
>
>
>
>
>
> *5. Postgresql and mysql -- both listed in Prerequisites. We have MySQL,
> Is that enough?*
>
>
>
> In fact, you can choose either one of postgresql and mysql.
>
> We use mysql for the measure and schedule persistance before, but due to
> the license issue of release, we have to switch to postgresql these days.
>
> If you want to use mysql, you need to modify some dependencies in service
> module and the application.properties file, rebuild the service.jar as well.
>
> We are going to place a document to help users for mysql or other db.
>
>
>
>
>
> Hope this helps you, please feel free if any question.
>
>
>
> Thanks,
>
> Lionel
>
>
>
> On Tue, Apr 3, 2018 at 1:41 PM, Vinod Raina <[email protected]>
> wrote:
>
> Hi Griffin team,
> In our team, We are looking to create a Data Quality model for your EDL
> Ingestion and are exploring Apache Griffin for it. We have gone through the
> documentation. The documentation is still not complete but we understand
> that the project is in incubation and there might be other reasons as well.
> It would be really helpful if there is any other source of information
> (other than the apache portal  and the git hub readme ) which can help us
> to understand the usage of this framework.
> Also ,we have below few question and would really if you can help us with
> the answers :
>
> 1. What is the usage of accuracy metric? In what situations, it will be
> useful?
> 2. Can we run other metrics using command-line? (or) Is only accuracy
> metric supported at the moment?
> 3. Project roadmap for features?
> 4. Can we use create custom Rules and profile existing data?
> 5. Postgresql and mysql -- both listed in Prerequisites. We have MySQL, Is
> that enough?
>
>
>
>
> Regards
> Vinod Raina | [email protected]<mailto:[email protected]>
> Associate Technical Architect
> M: +91 9711022965
> Tavant Technologies | www.tavant.com
> <https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tavant.com&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=MnrwgHWIuurIvvm8WmPkmNwvkZV9mmfQXpb8ng9H8ug%3D&reserved=0>
> <http://www.tavant.com/
> <https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tavant.com%2F&data=01%7C01%7Cvinod.raina%40tavant.com%7C99770c25b3bf4350c15a08d5993f6711%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=5wRwGG4m3wzw8JFXIWyg%2B4a3GYZXSUV5iElBegNjGfY%3D&reserved=0>
> >
> Okaya Centre, Tower 1, 5th Floor,B-5, Sector 62, Noida, UP 201 309
>
> ________________________________
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>
>
>
>
>
>
>
>
>

Re: Griffin Service Error

Reply via email to