[
https://issues.apache.org/jira/browse/SPARK-22457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16326959#comment-16326959
]
Jacek Laskowski commented on SPARK-22457:
-----------------------------------------
That should be fairly easy to fix _iff_ we want to restrict the formats to
{{FileFormat}} (that the mentioned formats are subtypes of).
Care to submit a pull request with the places where {{path}} is used to limit
their scope to {{FileFormats}} only? (that would help draw more attention to
the issue).
> Tables are supposed to be MANAGED only taking into account whether a path is
> provided
> -------------------------------------------------------------------------------------
>
> Key: SPARK-22457
> URL: https://issues.apache.org/jira/browse/SPARK-22457
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: David Arroyo
> Priority: Major
>
> As far as I know, since Spark 2.2, tables are supposed to be MANAGED only
> taking into account whether a path is provided:
> {code:java}
> val tableType = if (storage.locationUri.isDefined) {
> CatalogTableType.EXTERNAL
> } else {
> CatalogTableType.MANAGED
> }
> {code}
> This solution seems to be right for filesystem based data sources. On the
> other hand, when working with other data sources such as elasticsearch, that
> solution is leading to a weird behaviour described below:
> 1) InMemoryCatalog's doCreateTable() adds a locationURI if
> CatalogTableType.MANAGED && tableDefinition.storage.locationUri.isEmpty.
> 2) Before loading the data source table FindDataSourceTable's
> readDataSourceTable() adds a path option if locationURI exists:
> {code:java}
> val pathOption = table.storage.locationUri.map("path" ->
> CatalogUtils.URIToString(_))
> {code}
> 3) That causes an error when reading from elasticsearch because 'path' is an
> option already supported by elasticsearch (locationUri is set to
> file:/home/user/spark-rv/elasticsearch/shop/clients)
> org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot find
> mapping for file:/home/user/spark-rv/elasticsearch/shop/clients - one is
> required before using Spark SQL
> Would be possible only to mark tables as MANAGED for a subset of data sources
> (TEXT, CSV, JSON, JDBC, PARQUET, ORC, HIVE) or think about any other solution?
> P.S. InMemoryCatalog' doDropTable() deletes the directory of the table which
> from my point of view should only be required for filesystem based data
> sources:
> {code:java}
> if (tableMeta.tableType == CatalogTableType.MANAGED)
> ...
> // Delete the data/directory of the table
> val dir = new Path(tableMeta.location)
> try {
> val fs = dir.getFileSystem(hadoopConfig)
> fs.delete(dir, true)
> } catch {
> case e: IOException =>
> throw new SparkException(s"Unable to drop table $table as failed
> " +
> s"to delete its directory $dir", e)
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]