[
https://issues.apache.org/jira/browse/CARBONDATA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
xubo245 updated CARBONDATA-3037:
--------------------------------
Description:
##Introduce
When read data by using CarbonData SDK from S3 , It throw some exception.
##Problem
{code:java}
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
for more info.
Exception in thread "main" com.amazonaws.AmazonClientException: Unable
to execute HTTP request: Timeout waiting for connection from pool
at
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:454)
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
at
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.<init>(AbstractDFSCarbonFile.java:75)
at
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.<init>(AbstractDFSCarbonFile.java:66)
at
org.apache.carbondata.core.datastore.filesystem.HDFSCarbonFile.<init>(HDFSCarbonFile.java:41)
at
org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.<init>(S3CarbonFile.java:41)
at
org.apache.carbondata.core.datastore.impl.DefaultFileTypeProvider.getCarbonFile(DefaultFileTypeProvider.java:53)
at
org.apache.carbondata.core.datastore.impl.FileFactory.getCarbonFile(FileFactory.java:99)
at
org.apache.carbondata.core.util.path.CarbonTablePath.getActualSchemaFilePath(CarbonTablePath.java:183)
at
org.apache.carbondata.core.util.path.CarbonTablePath.getSchemaFilePath(CarbonTablePath.java:178)
at
org.apache.carbondata.core.metadata.schema.SchemaReader.readCarbonTableFromStore(SchemaReader.java:41)
at
org.apache.carbondata.core.metadata.schema.table.CarbonTable.buildFromTablePath(CarbonTable.java:288)
at
org.apache.carbondata.core.datamap.DataMapStoreManager.getCarbonTable(DataMapStoreManager.java:496)
at
org.apache.carbondata.core.datamap.DataMapStoreManager.clearDataMaps(DataMapStoreManager.java:460)
at
org.apache.carbondata.sdk.file.CarbonReaderBuilder.build(CarbonReaderBuilder.java:180)
at
org.apache.carbondata.examples.sdk.SDKS3ReadExample.main(SDKS3ReadExample.java:67)
Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout
waiting for connection from pool
at
org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232)
at
org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
at com.amazonaws.http.conn.$Proxy7.getConnection(Unknown Source)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384)
... 20 more
Process finished with exit code 1
{code}
##Analysis
The default value of fs.s3a.connection.maximum is 15. When read the 16th file,
it will throw ConnectionPoolTimeoutException because the connect not enougth.
org.apache.hadoop.fs.s3a#initialize
AWSCredentialsProviderChain credentials = new
AWSCredentialsProviderChain(new AWSCredentialsProvider[]{new
BasicAWSCredentialsProvider(accessKey, secretKey), new
InstanceProfileCredentialsProvider(), new AnonymousAWSCredentialsProvider()});
this.bucket = name.getHost();
ClientConfiguration awsConf = new ClientConfiguration();
awsConf.setMaxConnections(conf.getInt("fs.s3a.connection.maximum", 15));
boolean secureConnections =
conf.getBoolean("fs.s3a.connection.ssl.enabled", true);
awsConf.setProtocol(secureConnections?Protocol.HTTPS:Protocol.HTTP);
awsConf.setMaxErrorRetry(conf.getInt("fs.s3a.attempts.maximum", 10));
awsConf.setConnectionTimeout(conf.getInt("fs.s3a.connection.establish.timeout",
'썐'));
awsConf.setSock
##Solution:
1. temporary solution
add configuration.set("fs.s3a.connection.maximum", "1660"); in configuration
Configuration configuration = new Configuration();
configuration.set(ACCESS_KEY, args[0]);
configuration.set(SECRET_KEY, args[1]);
configuration.set(ENDPOINT, args[2]);
configuration.set("fs.s3a.connection.maximum", "166");
CarbonReader reader = CarbonReader
.builder(path, "_temp")
.withHadoopConf(configuration)
.build();
2. final solution
release the connect
was:
## Introduce
When read data by using CarbonData SDK from S3 , It throw some exception.
##Problem
{code:java}
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
for more info.
Exception in thread "main" com.amazonaws.AmazonClientException: Unable
to execute HTTP request: Timeout waiting for connection from pool
at
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:454)
at
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
at
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
at
com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
at
org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
at
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.<init>(AbstractDFSCarbonFile.java:75)
at
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.<init>(AbstractDFSCarbonFile.java:66)
at
org.apache.carbondata.core.datastore.filesystem.HDFSCarbonFile.<init>(HDFSCarbonFile.java:41)
at
org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.<init>(S3CarbonFile.java:41)
at
org.apache.carbondata.core.datastore.impl.DefaultFileTypeProvider.getCarbonFile(DefaultFileTypeProvider.java:53)
at
org.apache.carbondata.core.datastore.impl.FileFactory.getCarbonFile(FileFactory.java:99)
at
org.apache.carbondata.core.util.path.CarbonTablePath.getActualSchemaFilePath(CarbonTablePath.java:183)
at
org.apache.carbondata.core.util.path.CarbonTablePath.getSchemaFilePath(CarbonTablePath.java:178)
at
org.apache.carbondata.core.metadata.schema.SchemaReader.readCarbonTableFromStore(SchemaReader.java:41)
at
org.apache.carbondata.core.metadata.schema.table.CarbonTable.buildFromTablePath(CarbonTable.java:288)
at
org.apache.carbondata.core.datamap.DataMapStoreManager.getCarbonTable(DataMapStoreManager.java:496)
at
org.apache.carbondata.core.datamap.DataMapStoreManager.clearDataMaps(DataMapStoreManager.java:460)
at
org.apache.carbondata.sdk.file.CarbonReaderBuilder.build(CarbonReaderBuilder.java:180)
at
org.apache.carbondata.examples.sdk.SDKS3ReadExample.main(SDKS3ReadExample.java:67)
Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout
waiting for connection from pool
at
org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232)
at
org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
at com.amazonaws.http.conn.$Proxy7.getConnection(Unknown Source)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384)
... 20 more
Process finished with exit code 1
{code}
##Analysis
The default value of fs.s3a.connection.maximum is 15. When read the 16th file,
it will throw ConnectionPoolTimeoutException because the connect not enougth.
org.apache.hadoop.fs.s3a#initialize
AWSCredentialsProviderChain credentials = new
AWSCredentialsProviderChain(new AWSCredentialsProvider[]{new
BasicAWSCredentialsProvider(accessKey, secretKey), new
InstanceProfileCredentialsProvider(), new AnonymousAWSCredentialsProvider()});
this.bucket = name.getHost();
ClientConfiguration awsConf = new ClientConfiguration();
awsConf.setMaxConnections(conf.getInt("fs.s3a.connection.maximum", 15));
boolean secureConnections =
conf.getBoolean("fs.s3a.connection.ssl.enabled", true);
awsConf.setProtocol(secureConnections?Protocol.HTTPS:Protocol.HTTP);
awsConf.setMaxErrorRetry(conf.getInt("fs.s3a.attempts.maximum", 10));
awsConf.setConnectionTimeout(conf.getInt("fs.s3a.connection.establish.timeout",
'썐'));
awsConf.setSock
##Solution:
1. temporary solution
add configuration.set("fs.s3a.connection.maximum", "1660"); in configuration
Configuration configuration = new Configuration();
configuration.set(ACCESS_KEY, args[0]);
configuration.set(SECRET_KEY, args[1]);
configuration.set(ENDPOINT, args[2]);
configuration.set("fs.s3a.connection.maximum", "166");
CarbonReader reader = CarbonReader
.builder(path, "_temp")
.withHadoopConf(configuration)
.build();
2. final solution
release the connect
> Throw ConnectionPoolTimeoutException when carbondata SDK read data from S3
> --------------------------------------------------------------------------
>
> Key: CARBONDATA-3037
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3037
> Project: CarbonData
> Issue Type: Improvement
> Affects Versions: 1.5.0
> Reporter: xubo245
> Assignee: xubo245
> Priority: Major
>
> ##Introduce
>
> When read data by using CarbonData SDK from S3 , It throw some exception.
> ##Problem
>
>
> {code:java}
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig
> for more info.
> Exception in thread "main" com.amazonaws.AmazonClientException: Unable
> to execute HTTP request: Timeout waiting for connection from pool
> at
> com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:454)
> at
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
> at
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
> at
> com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:976)
> at
> com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:956)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:892)
> at
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:77)
> at
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.<init>(AbstractDFSCarbonFile.java:75)
> at
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.<init>(AbstractDFSCarbonFile.java:66)
> at
> org.apache.carbondata.core.datastore.filesystem.HDFSCarbonFile.<init>(HDFSCarbonFile.java:41)
> at
> org.apache.carbondata.core.datastore.filesystem.S3CarbonFile.<init>(S3CarbonFile.java:41)
> at
> org.apache.carbondata.core.datastore.impl.DefaultFileTypeProvider.getCarbonFile(DefaultFileTypeProvider.java:53)
> at
> org.apache.carbondata.core.datastore.impl.FileFactory.getCarbonFile(FileFactory.java:99)
> at
> org.apache.carbondata.core.util.path.CarbonTablePath.getActualSchemaFilePath(CarbonTablePath.java:183)
> at
> org.apache.carbondata.core.util.path.CarbonTablePath.getSchemaFilePath(CarbonTablePath.java:178)
> at
> org.apache.carbondata.core.metadata.schema.SchemaReader.readCarbonTableFromStore(SchemaReader.java:41)
> at
> org.apache.carbondata.core.metadata.schema.table.CarbonTable.buildFromTablePath(CarbonTable.java:288)
> at
> org.apache.carbondata.core.datamap.DataMapStoreManager.getCarbonTable(DataMapStoreManager.java:496)
> at
> org.apache.carbondata.core.datamap.DataMapStoreManager.clearDataMaps(DataMapStoreManager.java:460)
> at
> org.apache.carbondata.sdk.file.CarbonReaderBuilder.build(CarbonReaderBuilder.java:180)
> at
> org.apache.carbondata.examples.sdk.SDKS3ReadExample.main(SDKS3ReadExample.java:67)
> Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout
> waiting for connection from pool
> at
> org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232)
> at
> org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199)
> at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> com.amazonaws.http.conn.ClientConnectionRequestFactory$Handler.invoke(ClientConnectionRequestFactory.java:70)
> at com.amazonaws.http.conn.$Proxy7.getConnection(Unknown Source)
> at
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
> at
> org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
> at
> com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384)
> ... 20 more
>
> Process finished with exit code 1
> {code}
> ##Analysis
> The default value of fs.s3a.connection.maximum is 15. When read the 16th
> file, it will throw ConnectionPoolTimeoutException because the connect not
> enougth.
>
> org.apache.hadoop.fs.s3a#initialize
>
> AWSCredentialsProviderChain credentials = new
> AWSCredentialsProviderChain(new AWSCredentialsProvider[]{new
> BasicAWSCredentialsProvider(accessKey, secretKey), new
> InstanceProfileCredentialsProvider(), new AnonymousAWSCredentialsProvider()});
> this.bucket = name.getHost();
> ClientConfiguration awsConf = new ClientConfiguration();
> awsConf.setMaxConnections(conf.getInt("fs.s3a.connection.maximum", 15));
> boolean secureConnections =
> conf.getBoolean("fs.s3a.connection.ssl.enabled", true);
> awsConf.setProtocol(secureConnections?Protocol.HTTPS:Protocol.HTTP);
> awsConf.setMaxErrorRetry(conf.getInt("fs.s3a.attempts.maximum", 10));
>
> awsConf.setConnectionTimeout(conf.getInt("fs.s3a.connection.establish.timeout",
> '썐'));
> awsConf.setSock
> ##Solution:
> 1. temporary solution
> add configuration.set("fs.s3a.connection.maximum", "1660"); in
> configuration
> Configuration configuration = new Configuration();
> configuration.set(ACCESS_KEY, args[0]);
> configuration.set(SECRET_KEY, args[1]);
> configuration.set(ENDPOINT, args[2]);
> configuration.set("fs.s3a.connection.maximum", "166");
> CarbonReader reader = CarbonReader
> .builder(path, "_temp")
> .withHadoopConf(configuration)
> .build();
> 2. final solution
> release the connect
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)