[
https://issues.apache.org/jira/browse/HIVE-18582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346386#comment-16346386
]
liubangchen commented on HIVE-18582:
------------------------------------
We can add a method to valid the method findUnknownPartitions of class
HiveMetaStoreChecker
{code:java}
void findUnknownPartitions(Table table, Set<Path> partPaths,
CheckResult result) throws IOException, HiveException {
Path tablePath = table.getPath();
// now check the table folder and see if we find anything
// that isn't in the metastore
Set<Path> allPartDirs = new HashSet<Path>();
getAllLeafDirs(tablePath, allPartDirs);
// don't want the table dir
allPartDirs.remove(tablePath);
// remove the partition paths we know about
allPartDirs.removeAll(partPaths);
// we should now only have the unexpected folders left
for (Path partPath : allPartDirs) {
if(!isVaildPartitionPath(table,partPath)){
LOG.warn("invalid data path:"+partPath.toString());
continue;
}
FileSystem fs = partPath.getFileSystem(conf);
String partitionName = getPartitionName(fs.makeQualified(tablePath),
partPath);
if (partitionName != null) {
PartitionResult pr = new PartitionResult();
pr.setPartitionName(partitionName);
pr.setTableName(table.getTableName());
result.getPartitionsNotInMs().add(pr);
}
}
}
boolean isVaildPartitionPath(Table table,Path partpath){
Path tablePath = table.getPath();
String partpathinfo=partpath.toString();
String
partinfo=partpathinfo.substring(tablePath.toString().length()+1,partpathinfo.length());
if(partinfo==null||"".equals(partinfo)){
return false;
}
String[] parts=partinfo.split("/");
if(parts==null||parts.length==0){
return false;
}
Map<String,String> partsmap=new java.util.HashMap<String,String>();
for(String part:parts){
int index=part.indexOf("=");
if(index<0){
continue;
}
String partname=part.substring(0,index);
partsmap.put(partname,partname);
}
for (FieldSchema field : table.getPartCols()) {
String val = partsmap.get(field.getName());
if (val == null || val.isEmpty()) {
return false;
}
}
return true;
}
{code}
Let me submit the patche.
> MSCK REPAIR TABLE Throw MetaException
> --------------------------------------
>
> Key: HIVE-18582
> URL: https://issues.apache.org/jira/browse/HIVE-18582
> Project: Hive
> Issue Type: Bug
> Components: Query Planning
> Affects Versions: 2.1.1
> Reporter: liubangchen
> Priority: Major
>
> while executing query MSCK REPAIR TABLE tablename I got Exception:
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException:
> MetaException(message:Expected 1 components, got 2
> (log_date=2015121309/vgameid=lyjt))
> at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1847)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:402)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> --
> Caused by: MetaException(message:Expected 1 components, got 2
> (log_date=2015121309/vgameid=lyjt))
> at
> org.apache.hadoop.hive.metastore.Warehouse.makeValsFromName(Warehouse.java:385)
> at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1845)
> {code}
> table PARTITIONED by (log_date,vgameid)
> The data file on HDFS is:
>
> {code:java}
> /usr/hive/warehouse/a.db/tablename/log_date=2015063023
> drwxr-xr-x - root supergroup 0 2018-01-26 09:41
> /usr/hive/warehouse/a.db/tablename/log_date=2015121309/vgameid=lyjt
> {code}
> The subdir of log_data=2015063023 is empty
> If i set hive.msck.path.validation=ignore Then msck repair table will
> executed ok.
> Then I found code like this:
> {code:java}
> private int msck(Hive db, MsckDesc msckDesc) {
> CheckResult result = new CheckResult();
> List<String> repairOutput = new ArrayList<String>();
> try {
> HiveMetaStoreChecker checker = new HiveMetaStoreChecker(db);
> String[] names = Utilities.getDbTableName(msckDesc.getTableName());
> checker.checkMetastore(names[0], names[1], msckDesc.getPartSpecs(),
> result);
> List<CheckResult.PartitionResult> partsNotInMs =
> result.getPartitionsNotInMs();
> if (msckDesc.isRepairPartitions() && !partsNotInMs.isEmpty()) {
> //I think bug is here
> AbstractList<String> vals = null;
> String settingStr = HiveConf.getVar(conf,
> HiveConf.ConfVars.HIVE_MSCK_PATH_VALIDATION);
> boolean doValidate = !("ignore".equals(settingStr));
> boolean doSkip = doValidate && "skip".equals(settingStr);
> // The default setting is "throw"; assume doValidate && !doSkip means
> throw.
> if (doValidate) {
> // Validate that we can add partition without escaping. Escaping was
> originally intended
> // to avoid creating invalid HDFS paths; however, if we escape the
> HDFS path (that we
> // deem invalid but HDFS actually supports - it is possible to create
> HDFS paths with
> // unprintable characters like ASCII 7), metastore will create
> another directory instead
> // of the one we are trying to "repair" here.
> Iterator<CheckResult.PartitionResult> iter = partsNotInMs.iterator();
> while (iter.hasNext()) {
> CheckResult.PartitionResult part = iter.next();
> try {
> vals = Warehouse.makeValsFromName(part.getPartitionName(), vals);
> } catch (MetaException ex) {
> throw new HiveException(ex);
> }
> for (String val : vals) {
> String escapedPath = FileUtils.escapePathName(val);
> assert escapedPath != null;
> if (escapedPath.equals(val)) continue;
> String errorMsg = "Repair: Cannot add partition " +
> msckDesc.getTableName()
> + ':' + part.getPartitionName() + " due to invalid characters
> in the name";
> if (doSkip) {
> repairOutput.add(errorMsg);
> iter.remove();
> } else {
> throw new HiveException(errorMsg);
> }
> }
> }
> }
> {code}
> I think AbstractList<String> vals = null; must placed after "while
> (iter.hasNext()) {" will work ok.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)