[jira] [Commented] (HDFS-196) File length not reported correctly after application crash

Kevin Beyer (JIRA) Tue, 09 Jun 2015 15:05:04 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579641#comment-14579641
 ]


Kevin Beyer commented on HDFS-196:
----------------------------------

Turns out the file is still considered "open for write"...  This is an hour or 
two after the client aborted, and after restarting hdfs.  The standard behavior 
would be the file would be closed shortly after the client aborting.

{block}
kevin-beyer-mbp2:t kevin$ hdfs fsck /tmp/junk17._COPYING_ -blocks
2015-06-09 14:52:02,807 WARN  [main] util.NativeCodeLoader 
(NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable
Connecting to namenode via http://localhost:50070
FSCK started by kevin (auth:SIMPLE) from /127.0.0.1 for path 
/tmp/junk17._COPYING_ at Tue Jun 09 14:52:03 PDT 2015
Status: HEALTHY
 Total size:    0 B (Total open files size: 1073741824 B)
 Total dirs:    0
 Total files:   0
 Total symlinks:                0 (Files currently being written: 1)
 Total blocks (validated):      0 (Total open file blocks (not validated): 8)
 Minimally replicated blocks:   0
 Over-replicated blocks:        0
 Under-replicated blocks:       0
 Mis-replicated blocks:         0
 Default replication factor:    1
 Average block replication:     0.0
 Corrupt blocks:                0
 Missing replicas:              0
 Number of data-nodes:          1
 Number of racks:               1
FSCK ended at Tue Jun 09 14:52:03 PDT 2015 in 2 milliseconds


The filesystem under path '/tmp/junk17._COPYING_' is HEALTHY
kevin-beyer-mbp2:t kevin$ hdfs fsck -openforwrite
2015-06-09 14:52:25,236 WARN  [main] util.NativeCodeLoader 
(NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for 
your platform... using builtin-java classes where applicable
Connecting to namenode via http://localhost:50070
FSCK started by kevin (auth:SIMPLE) from /127.0.0.1 for path / at Tue Jun 09 
14:52:26 PDT 2015
....................................................................................................
./tmp/junk17._COPYING_ 1073741824 bytes, 8 block(s), OPENFORWRITE: 
..................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
..........................................................................Status:
 HEALTHY
 Total size:    3905069992 B
 Total dirs:    276
 Total files:   674
 Total symlinks:                0
 Total blocks (validated):      321 (avg. block size 12165327 B)
 Minimally replicated blocks:   321 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    1
 Average block replication:     1.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          1
 Number of racks:               1
FSCK ended at Tue Jun 09 14:52:26 PDT 2015 in 167 milliseconds
{block}

> File length not reported correctly after application crash
> ----------------------------------------------------------
>
>                 Key: HDFS-196
>                 URL: https://issues.apache.org/jira/browse/HDFS-196
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Doug Judd
>
> Our application (Hypertable) creates a transaction log in HDFS.  This log is 
> written with the following pattern:
> out_stream.write(header, 0, 7);
> out_stream.sync()
> out_stream.write(data, 0, amount);
> out_stream.sync()
> [...]
> However, if the application crashes and then comes back up again, the 
> following statement
> length = mFilesystem.getFileStatus(new Path(fileName)).getLen();
> returns the wrong length.  Apparently this is because this method fetches 
> length information from the NameNode which is stale.  Ideally, a call to 
> getFileStatus() would return the accurate file length by fetching the size of 
> the last block from the primary datanode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-196) File length not reported correctly after application crash

Reply via email to