Thanks for all your help I think I have found out the issue that all over mappers have HADOOP_CLASSPATH overwritten in hadoop_env.sh :(
On Wed, Feb 20, 2019 at 4:50 PM Xiaoxiao Wang <xxw...@23andme.com> wrote: > Pedro > > to answer your question, I have verified that the configuration file has > loaded correctly from the classpath, > > I have 300+ mappers try to make connections to the db at the same time, > and it still gives me the same error that timeout at after 60000 ms > > On Wed, Feb 20, 2019 at 4:45 PM Xiaoxiao Wang <xxw...@23andme.com> wrote: > >> Since I've known that the configuration have been loaded up correctly >> through the classpath >> >> I have tested on the real application, however, it still timed out with >> the same default value from the mappers >> >> Error: java.io.IOException: >> org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=36, >> exceptions: Thu Feb 21 00:38:28 UTC 2019, null, >> java.net.SocketTimeoutException: callTimeout=60000, callDuration=60309 >> >> On Wed, Feb 20, 2019 at 4:25 PM Xiaoxiao Wang <xxw...@23andme.com> wrote: >> >>> i made this work on my toy application, getConf() is not an issue, and >>> hbase conf can get the correct settings >>> >>> I'm trying out again on the real application >>> >>> On Wed, Feb 20, 2019 at 4:13 PM William Shen <wills...@marinsoftware.com> >>> wrote: >>> >>>> Whatever is in super.getConf() should get overriden by hbase-site.xml >>>> because addHbaseResources because will layer on hbase-site.xml last. The >>>> question is which one got picked up... (maybe there is another one on >>>> the >>>> classpath, is that possible?) >>>> >>>> On Wed, Feb 20, 2019 at 4:10 PM Xiaoxiao Wang >>>> <xxw...@23andme.com.invalid> >>>> wrote: >>>> >>>> > I'm trying out on the mapreduce application, I made it work on my toy >>>> > application >>>> > >>>> > On Wed, Feb 20, 2019 at 4:09 PM William Shen < >>>> wills...@marinsoftware.com> >>>> > wrote: >>>> > >>>> > > A bit of a long shot, but do you happen to have another >>>> hbase-site.xml >>>> > > bundled in your jar accidentally that might be overriding what is >>>> on the >>>> > > classpath? >>>> > > >>>> > > On Wed, Feb 20, 2019 at 3:58 PM Xiaoxiao Wang >>>> <xxw...@23andme.com.invalid >>>> > > >>>> > > wrote: >>>> > > >>>> > > > A bit more information, I feel the classpath didn't get passed in >>>> > > correctly >>>> > > > by doing >>>> > > > >>>> > > > conf = HBaseConfiguration.addHbaseResources(super.getConf()); >>>> > > > >>>> > > > and this conf also didn't pick up the expected properties >>>> > > > >>>> > > > >>>> > > > On Wed, Feb 20, 2019 at 3:56 PM Xiaoxiao Wang <xxw...@23andme.com >>>> > >>>> > > wrote: >>>> > > > >>>> > > > > Pedro >>>> > > > > >>>> > > > > thanks for your info, yes, I have tried both >>>> > > > > HADOOP_CLASSPATH=/etc/hbase/conf/hbase-site.xml and >>>> > > > > HADOOP_CLASSPATH=/etc/hbase/conf/ (without file), and yes >>>> checked >>>> > > > > hadoop-env.sh as well to make sure it did >>>> > > > > HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/others >>>> > > > > >>>> > > > > And also for your second question, it is indeed a map reduce >>>> job, and >>>> > > it >>>> > > > > is trying to query phoenix from map function! (and we make sure >>>> all >>>> > the >>>> > > > > nodes have hbase-site.xml installed properly ) >>>> > > > > >>>> > > > > thanks >>>> > > > > >>>> > > > > On Wed, Feb 20, 2019 at 3:53 PM Pedro Boado < >>>> pedro.bo...@gmail.com> >>>> > > > wrote: >>>> > > > > >>>> > > > >> Your classpath variable should be pointing to the folder >>>> containing >>>> > > your >>>> > > > >> hbase-site.xml and not directly to the file. >>>> > > > >> >>>> > > > >> But certain distributions tend to override that envvar inside >>>> > > > >> hadoop-env.sh >>>> > > > >> or hadoop.sh . >>>> > > > >> >>>> > > > >> Out of curiosity, have you written a map-reduce application >>>> and are >>>> > > you >>>> > > > >> querying phoenix from map functions? >>>> > > > >> >>>> > > > >> On Wed, 20 Feb 2019, 23:34 Xiaoxiao Wang, >>>> > <xxw...@23andme.com.invalid >>>> > > > >>>> > > > >> wrote: >>>> > > > >> >>>> > > > >> > HI Pedro >>>> > > > >> > >>>> > > > >> > thanks for your help, I think we know that we need to set the >>>> > > > classpath >>>> > > > >> to >>>> > > > >> > the hadoop program, and what we tried was >>>> > > > >> > HADOOP_CLASSPATH=/etc/hbase/conf/hbase-site.xml hadoop jar >>>> > $test_jar >>>> > > > >> but it >>>> > > > >> > didn't work >>>> > > > >> > So we are wondering if anything we did wrong? >>>> > > > >> > >>>> > > > >> > On Wed, Feb 20, 2019 at 3:24 PM Pedro Boado < >>>> pbo...@apache.org> >>>> > > > wrote: >>>> > > > >> > >>>> > > > >> > > Hi, >>>> > > > >> > > >>>> > > > >> > > How many concurrent client connections are we talking >>>> about? You >>>> > > > >> might be >>>> > > > >> > > opening more connections than the RS can handle ( under >>>> these >>>> > > > >> > circumstances >>>> > > > >> > > most of the client threads would end exhausting their retry >>>> > count >>>> > > ) >>>> > > > . >>>> > > > >> I >>>> > > > >> > > would bet that you've get a bottleneck in the RS keeping >>>> > > > >> SYSTEM.CATALOG >>>> > > > >> > > table (this was an issue in 4.7 ) as every new connection >>>> would >>>> > be >>>> > > > >> > querying >>>> > > > >> > > this table first. >>>> > > > >> > > >>>> > > > >> > > Try to update to our cloudera-compatible parcels instead of >>>> > using >>>> > > > >> clabs - >>>> > > > >> > > which are discontinued by Cloudera and not supported by the >>>> > Apache >>>> > > > >> > Phoenix >>>> > > > >> > > project - . >>>> > > > >> > > >>>> > > > >> > > Once updated to phoenix 4.14 you should be able to use >>>> > > > >> > > UPDATE_CACHE_FREQUENCY >>>> > > > >> > > property in order to reduce pressure on system tables. >>>> > > > >> > > >>>> > > > >> > > Adding an hbase-site.xml with the required properties to >>>> the >>>> > > client >>>> > > > >> > > application classpath should just work. >>>> > > > >> > > >>>> > > > >> > > I hope it helps. >>>> > > > >> > > >>>> > > > >> > > On Wed, 20 Feb 2019, 22:50 Xiaoxiao Wang, >>>> > > > <xxw...@23andme.com.invalid >>>> > > > >> > >>>> > > > >> > > wrote: >>>> > > > >> > > >>>> > > > >> > > > Hi, who may help >>>> > > > >> > > > >>>> > > > >> > > > We are running a Hadoop application that needs to use >>>> phoenix >>>> > > JDBC >>>> > > > >> > > > connection from the workers. >>>> > > > >> > > > The connection works, but when too many connection >>>> established >>>> > > at >>>> > > > >> the >>>> > > > >> > > same >>>> > > > >> > > > time, it throws RPC timeouts >>>> > > > >> > > > >>>> > > > >> > > > Error: java.io.IOException: >>>> > > > >> > > > org.apache.phoenix.exception.PhoenixIOException: Failed >>>> after >>>> > > > >> > > attempts=36, >>>> > > > >> > > > exceptions: Wed Feb 20 20:02:43 UTC 2019, null, java.net >>>> > > > >> > > .SocketTimeoutException: >>>> > > > >> > > > callTimeout=60000, callDuration=60506. ... >>>> > > > >> > > > >>>> > > > >> > > > So we have figured we should probably set a higher >>>> > > > >> hbase.rpc.timeout >>>> > > > >> > > > value, but then it comes to the issue: >>>> > > > >> > > > >>>> > > > >> > > > A little bit background on how we run the application >>>> > > > >> > > > >>>> > > > >> > > > Here is how we get PhoenixConnection from java program >>>> > > > >> > > > DriverManager.getConnection("jdbc:phoenix:host", props) >>>> > > > >> > > > And we trigger the program by using >>>> > > > >> > > > hadoop jar $test_jar >>>> > > > >> > > > >>>> > > > >> > > > >>>> > > > >> > > > We have tried multiple approaches to load hbase/phoenix >>>> > > > >> configuration, >>>> > > > >> > > but >>>> > > > >> > > > none of them get respected by PhoenixConnection, here >>>> are the >>>> > > > >> methods >>>> > > > >> > we >>>> > > > >> > > > tried >>>> > > > >> > > > * Pass hbase_conf_dir through HADOOP_CLASSPATH, so run >>>> the >>>> > > hadoop >>>> > > > >> > > > application like HADOOP_CLASSPATH=/etc/hbase/conf/ >>>> hadoop jar >>>> > > > >> > $test_jar . >>>> > > > >> > > > However, PhoenixConnection doesn’t respect the parameters >>>> > > > >> > > > * Tried passing -Dhbase.rpc.timeout=1800, which is >>>> picked up >>>> > by >>>> > > > >> hbase >>>> > > > >> > > conf >>>> > > > >> > > > object, but not PhoniexConnection >>>> > > > >> > > > * Explicitly set those parameters and pass them to the >>>> > > > >> > PhoenixConnection >>>> > > > >> > > > props.setProperty("hbase.rpc.timeout", "1800"); >>>> > > > >> > > > props.setProperty(“phoenix.query.timeoutMs", "1800"); >>>> > > > >> > > > Also didn’t get respected by PhoenixConnection >>>> > > > >> > > > * also tried what is suggested by phoenix here >>>> > > > >> > > > https://phoenix.apache.org/#connStr , use :longRunning >>>> > together >>>> > > > >> with >>>> > > > >> > > > those properties, still didn’t seem to work >>>> > > > >> > > > >>>> > > > >> > > > >>>> > > > >> > > > Besides all those approaches we tried, I have explicitly >>>> > output >>>> > > > >> those >>>> > > > >> > > > parameters we care from the connection, >>>> > > > >> > > > connection.getQueryServices().getProps() >>>> > > > >> > > > The default values I got are 60000 for >>>> hbase.rpc.timeout, and >>>> > > 600k >>>> > > > >> for >>>> > > > >> > > > phoenix.query.timeoutMs , so I have tried to run a query >>>> lthat >>>> > > > would >>>> > > > >> > run >>>> > > > >> > > > longer than 10 mins, Ideally it should timeout, however, >>>> it >>>> > runs >>>> > > > >> over >>>> > > > >> > 20 >>>> > > > >> > > > mins and didn’t timeout. So I’m wondering how >>>> > PhoenixConnection >>>> > > > >> respect >>>> > > > >> > > > those properties? >>>> > > > >> > > > >>>> > > > >> > > > >>>> > > > >> > > > So with some of your help, we’d like to know if there’s >>>> any >>>> > > thing >>>> > > > >> wrong >>>> > > > >> > > > with our approaches. And we’d like to get rid of those >>>> > > > >> > > SocketTimeExceptions. >>>> > > > >> > > > We are using phoenix-core version is >>>> 4.7.0-clabs-phoenix1.3.0 >>>> > , >>>> > > > and >>>> > > > >> our >>>> > > > >> > > > phoenix-client version is >>>> phoenix-4.7.0-clabs-phoenix1.3.0.23 >>>> > > (we >>>> > > > >> have >>>> > > > >> > > > tried phoenix-4.14.0-HBase-1.3 as well, which didn’t work >>>> > > either). >>>> > > > >> > > > >>>> > > > >> > > > >>>> > > > >> > > > Thanks for your time >>>> > > > >> > > > >>>> > > > >> > > > >>>> > > > >> > > > >>>> > > > >> > > > >>>> > > > >> > > > >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > > >>>> > > > >>>> > > >>>> > >>>> >>>