[ 
https://issues.apache.org/jira/browse/PHOENIX-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinyi Yan reassigned PHOENIX-5940:
----------------------------------

    Assignee: Xinyi Yan

> Pre-4.15 client cannot connect to 4.15+ server after SYSTEM.CATALOG region 
> has split
> ------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-5940
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-5940
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.14.3
>            Reporter: Chinmay Kulkarni
>            Assignee: Xinyi Yan
>            Priority: Blocker
>             Fix For: 4.16.0
>
>
> Steps to repro:
>  # Start the server with 4.15 or 4.16-SNAPSHOT (head of 4.x) with the default 
> setting for splitting SYSTEM.CATALOG i.e. 
> _phoenix.system.catalog.splittable=true_
>  # Connect with a 4.15+ client and create enough tables/views/indices to 
> cause the SYSTEM.CATALOG region to split (you may want to set the following 
> server-side configs for a quicker repro:
>  ## _hbase.hregion.memstore.flush.size=1048576_ i.e. 1MB (to flush memstores 
> quicker),
>  ## _hbase.hregion.max.filesize=2097152_ i.e. 2MB (so we don've have to load 
> too much data to cause a region split),
>  ## _hbase.zookeeper.property.maxClientCnxns=-1_ (If we don’t set this, the 
> default limit is easily hit when creating hundreds of views),
>  ## _hbase.table.sanity.checks=false_ (otherwise HBase complains that the 
> HRegion max file size config is too small).
>  # With these configs, I've found that creating ~4000 views is sufficient to 
> cause the SYSTEM.CATALOG region to split.
>  # Now connect with any pre-4.15 client like 4.14.3. Getting a connection 
> will fail with the following stack trace:
> {noformat}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>  org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2013 (INT15): ERROR 
> 2013 (INT15): MetadataEndpointImpl doGetTable called for table not present on 
> region tableName=SYSTEM.CATALOG SYSTEM.CATALOG
>       at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:114)
>       at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3214)
>       at 
> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:17268)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8338)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2170)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2152)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35076)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.sql.SQLException: ERROR 2013 (INT15): MetadataEndpointImpl 
> doGetTable called for table not present on region tableName=SYSTEM.CATALOG
>       at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:575)
>       at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:195)
>       at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2916)
>       at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3208)
>       ... 9 more
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1275)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:35542)
>       at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1702)
>       ... 13 more
> 20/06/04 19:14:18 WARN client.HTable: Error calling coprocessor service 
> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService for 
> row
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hbase.DoNotRetryIOException: 
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2013 (INT15): ERROR 2013 
> (INT15): MetadataEndpointImpl doGetTable called for table not present on 
> region tableName=SYSTEM.CATALOG SYSTEM.CATALOG
>       at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:114)
>       at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3214)
>       at 
> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:17268)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8338)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2170)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2152)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35076)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.sql.SQLException: ERROR 2013 (INT15): MetadataEndpointImpl 
> doGetTable called for table not present on region tableName=SYSTEM.CATALOG
>       at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:575)
>       at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:195)
>       at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2916)
>       at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3208)
>       ... 9 more
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>       at 
> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1775)
>       at 
> org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1731)
>       at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.checkClientServerCompatibility(ConnectionQueryServicesImpl.java:1350)
>       at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:1239)
>       at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1576)
>       at 
> org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:2731)
>       at 
> org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:1115)
>       at 
> org.apache.phoenix.compile.CreateTableCompiler$1.execute(CreateTableCompiler.java:192)
>       at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:410)
>       at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:393)
>       at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>       at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:392)
>       at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:380)
>       at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1810)
>       at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2623)
>       at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2586)
>       at 
> org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)
>       at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2586)
>       at 
> org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:255)
>       at 
> org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:144)
>       at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
>       at sqlline.DatabaseConnection.connect(DatabaseConnection.java:157)
>       at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:203)
>       at sqlline.Commands.connect(Commands.java:1064)
>       at sqlline.Commands.connect(Commands.java:996)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)
>       at sqlline.SqlLine.dispatch(SqlLine.java:809)
>       at sqlline.SqlLine.initArgs(SqlLine.java:588)
>       at sqlline.SqlLine.begin(SqlLine.java:661)
>       at sqlline.SqlLine.start(SqlLine.java:398)
>       at sqlline.SqlLine.main(SqlLine.java:291)
> {noformat}
> RS logs for the region throwing the error:
> {noformat}
> 2020-06-04 19:14:18,655 ERROR 
> [RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=56704] 
> coprocessor.MetaDataEndpointImpl: loading system catalog table inside 
> getVersion failed
> java.sql.SQLException: ERROR 2013 (INT15): MetadataEndpointImpl doGetTable 
> called for table not present on region tableName=SYSTEM.CATALOG
>       at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:575)
>       at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:195)
>       at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2916)
>       at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3208)
>       at 
> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:17268)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8338)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2170)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2152)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35076)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> 2020-06-04 19:14:18,656 DEBUG 
> [RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=56704] ipc.RpcServer: 
> RpcServer.FifoWFPBQ.default.handler=29,queue=2,port=56704: callId: 7 service: 
> ClientService methodName: ExecService size: 131 connection: 10.3.4.181:57305
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2013 (INT15): ERROR 2013 
> (INT15): MetadataEndpointImpl doGetTable called for table not present on 
> region tableName=SYSTEM.CATALOG SYSTEM.CATALOG
>       at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:114)
>       at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3214)
>       at 
> org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:17268)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8338)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2170)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2152)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:35076)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: java.sql.SQLException: ERROR 2013 (INT15): MetadataEndpointImpl 
> doGetTable called for table not present on region tableName=SYSTEM.CATALOG
>       at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:575)
>       at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:195)
>       at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2916)
>       at 
> org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getVersion(MetaDataEndpointImpl.java:3208)
>       ... 9 more
> {noformat}
> The reason why this happens is that in a pre-4.15 client, inside 
> CQSI.checkClientServerCompatibility, the getVersion method is invoked on 
> MetaDataEndpointImpl over *all SYSTEM.CATALOG regions* (we pass in null for 
> startKey and endKey), see 
> [this|https://github.com/apache/phoenix/blob/e2993552dc88cb7fc59fc0dfdaa2876ac260886c/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L1350].
> Inside MetaDataEndpointImpl#getVersion, we [call 
> doGetTable|https://github.com/apache/phoenix/blob/77c6cb32fce04b912b7c502dc170d86af8293fe6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L3224]
>  Now, if SYSTEM.CATALOG has split, this call will also be invoked on a region 
> that does not contain the header row for SYSTEM.CATALOG causing it to fail in 
> MetaDataEndpointImpl#doGetTable 
> [here|https://github.com/apache/phoenix/blob/77c6cb32fce04b912b7c502dc170d86af8293fe6/phoenix-core/src/main/java/org/apache/phoenix/coprocessor/MetaDataEndpointImpl.java#L2928-L2933].
> This is avoided in 4.15+ clients since we have restricted the getVersion 
> invocation to the region containing the header row for SYSTEM.CATALOG (see 
> [this|https://github.com/apache/phoenix/blob/77c6cb32fce04b912b7c502dc170d86af8293fe6/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L1517]).
> We need to add a special condition to consider pre-4.15 clients before 
> propagating the error back to clients inside MetaDataEndpointImpl.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to