[
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538548#comment-14538548
]
Mostafa Mokhtar commented on HIVE-8065:
---------------------------------------
[~spena]
I am running on a cluster which has Namenode HA enabled and I can't run
queries, wondering if Namenode HA was tested?
This is the call stack I am getting
{code}
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1866)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStrongestEncryptedTablePath(SemanticAnalyzer.java:1943)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getStagingDirectoryPathname(SemanticAnalyzer.java:1975)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1788)
... 25 more
Caused by: java.lang.IllegalArgumentException: Wrong FS:
hdfs://namenode:8020/apps/hive/warehouse/test_table, expected: hdfs://namenode
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1906)
at
org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
at
org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1245)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.isPathEncrypted(SemanticAnalyzer.java:1862)
... 28 more
{code}
I stepped through the debugger and found that the authorities mismatch
thisAuthority "namenode" (id=86)
thatAuthority "namenode:8020" (id=99)
Then in FileSystem.checkPath when the authorities mismatch the code falls
through and throw the exception
{code}
thatAuthority = uri.getAuthority();
if (thisAuthority == thatAuthority || // authorities match
(thisAuthority != null &&
thisAuthority.equalsIgnoreCase(thatAuthority)))
return;
}
}
throw new IllegalArgumentException("Wrong FS: "+path+
", expected: "+this.getUri());
{code}
Is this an issue with HDFS encryption on Hive or you think this is a
configuration issue?
On a related note I don't think Hive should be checking if the staging
directory is encrypted if none of the Hive managed tables are encrypted.
> Support HDFS encryption functionality on Hive
> ---------------------------------------------
>
> Key: HIVE-8065
> URL: https://issues.apache.org/jira/browse/HIVE-8065
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 0.13.1
> Reporter: Sergio Peña
> Assignee: Sergio Peña
> Labels: Hive-Scrum
>
> The new encryption support on HDFS makes Hive incompatible and unusable when
> this feature is used.
> HDFS encryption is designed so that an user can configure different
> encryption zones (or directories) for multi-tenant environments. An
> encryption zone has an exclusive encryption key, such as AES-128 or AES-256.
> Because of security compliance, the HDFS does not allow to move/rename files
> between encryption zones. Renames are allowed only inside the same encryption
> zone. A copy is allowed between encryption zones.
> See HDFS-6134 for more details about HDFS encryption design.
> Hive currently uses a scratch directory (like /tmp/$user/$random). This
> scratch directory is used for the output of intermediate data (between MR
> jobs) and for the final output of the hive query which is later moved to the
> table directory location.
> If Hive tables are in different encryption zones than the scratch directory,
> then Hive won't be able to renames those files/directories, and it will make
> Hive unusable.
> To handle this problem, we can change the scratch directory of the
> query/statement to be inside the same encryption zone of the table directory
> location. This way, the renaming process will be successful.
> Also, for statements that move files between encryption zones (i.e. LOAD
> DATA), a copy may be executed instead of a rename. This will cause an
> overhead when copying large data files, but it won't break the encryption on
> Hive.
> Another security thing to consider is when using joins selects. If Hive joins
> different tables with different encryption key strengths, then the results of
> the select might break the security compliance of the tables. Let's say two
> tables with 128 bits and 256 bits encryption are joined, then the temporary
> results might be stored in the 128 bits encryption zone. This will conflict
> with the table encrypted with 256 bits temporary.
> To fix this, Hive should be able to select the scratch directory that is more
> secured/encrypted in order to save the intermediate data temporary with no
> compliance issues.
> For instance:
> {noformat}
> SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
> {noformat}
> - This should use a scratch directory (or staging directory) inside the
> table-aes256 table location.
> {noformat}
> INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
> {noformat}
> - This should use a scratch directory inside the table-aes1 location.
> {noformat}
> FROM table-unencrypted
> INSERT OVERWRITE TABLE table-aes128 SELECT id, name
> INSERT OVERWRITE TABLE table-aes256 SELECT id, name
> {noformat}
> - This should use a scratch directory on each of the tables locations.
> - The first SELECT will have its scratch directory on table-aes128 directory.
> - The second SELECT will have its scratch directory on table-aes256 directory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)