Mithun Radhakrishnan created HIVE-11475: -------------------------------------------
Summary: Bad rename of directory during commit, when using HCat dynamic-partitioning. Key: HIVE-11475 URL: https://issues.apache.org/jira/browse/HIVE-11475 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 1.2.0 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Priority: Critical Here's one that [~knoguchi] found and root-caused. This one's a doozy. Under seemingly random conditions, the temporary output (under {{_SCRATCH1.234*}}) for HCat's dynamic partitioner isn't promoted correctly to the final table directory. The namenode logs indicated a botched directory-rename: {noformat} 2015-08-02 03:24:29,090 INFO FSNamesystem.audit: allowed=true ugi=myth (auth:TOKEN) via wrkf...@grid.myth.net (auth:TOKEN) ip=/10.192.100.117 cmd=rename src=/projects/hive/myth.db/myth_table_15m/_SCRATCH2.8772158158263395E-4/tc=1/utc_time=201508020145/part-r-00000 dst=/projects/hive/myth.db/myth_table_15mE-4/tc=1/utc_time=201508020145/part-r-00000 perm=myth:madcaps:rw-r-r- proto=rpc {noformat} Note that the table-directory name {{"myth_table_15m"}} is appended with {{"E-4"}}. This'll break anything that uses HDFS-based polling. [~knoguchi] points out the following code: {code:title=HCatOutputFormat.java} 119 if ((idHash = conf.get(HCatConstants.HCAT_OUTPUT_ID_HASH)) == null) { 120 idHash = String.valueOf(Math.random()); 121 } {code} {code:title=FileOutputCommitterContainer.java} 370 String finalLocn = jobLocation.replaceAll(Path.SEPARATOR + SCRATCH_DIR_NAME + "\\d\\.?\\d+",""); {code} The problem is that when {{Math.random()}} produces a number <= 10 ^-3^, {{String.valueOf(double)}} uses exponential notation. The regex doesn't capture or handle this notation. The fix belies the debugging-effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)