[
https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14253948#comment-14253948
]
Colin Patrick McCabe commented on HDFS-7056:
--------------------------------------------
{code}
public boolean truncate(String src, long newLength, String clientName)
{code}
How about calling the first argument "target" rather than "src"? "src" and
"dst" make sense for {{rename}}, not so much for truncate since there is no
"destination"
I would still like to see these functions return an enum like:
{code}
enum TruncateResult { FINISHED, IN_PROGRESS }
{code}
the problem with returning a boolean is that people will assume that if it
returns false, the operation failed. This is how {{File#delete}} works, how
{{File#mkdir}} works, how pretty much everything in the Java standard library
works. We have some stuff like this in Hadoop's {{FileSystem.java}} as well.
So it will be confusing to users if our API returns a boolean that means
something different. But by returning an enum, we make it clear what is going
on. Of course, we can still use booleans in the protobuf if that's more
convenient. This is just a user API thing.
{code}
* @param bc file
+ * @param lastBlockDelta num of bytes to remove from block
* @return the last block locations if the block is partial or null otherwise
*/
public LocatedBlock convertLastBlockToUnderConstruction(
- BlockCollection bc) throws IOException {
+ BlockCollection bc, long lastBlockDelta) throws IOException {
BlockInfo oldBlock = bc.getLastBlock();
if(oldBlock == null ||
- bc.getPreferredBlockSize() == oldBlock.getNumBytes())
+ bc.getPreferredBlockSize() == oldBlock.getNumBytes() - lastBlockDelta)
{code}
How about calling this {{bytesToRemove}} rather than {{lastBlockDelta}}? When
I think of "delta," I think of something that could be either positive or
negative.
{code}
}
+ // If we are performing a truncate recovery than set recovery
fields
+ // to old block.
+ boolean truncateRecovery = b.getRecoveryBlock() != null;
+ boolean copyOnTruncateRecovery = truncateRecovery &&
+ b.getRecoveryBlock().getBlockId() != b.getBlockId();
+ ExtendedBlock primaryBlock = (copyOnTruncateRecovery) ?
+ new ExtendedBlock(blockPoolId, b.getRecoveryBlock()) :
+ new ExtendedBlock(blockPoolId, b);
// If we only get 1 replica after eliminating stale nodes, then
choose all
{code}
{{recoveryBlock}} seems like a very generic name for this. It seems that
{{BlockInfoUnderConstruction#recoveryBlock}} is only used for truncate, never
for anything else. So let's call it {{BlockInfoUnderConstruction#truncateDst}}
or something.
{code}
message RecoveringBlockProto {
required uint64 newGenStamp = 1; // New genstamp post recovery
required LocatedBlockProto block = 2; // Block to be recovered
+ optional BlockProto newBlock = 3; // New block for recovery (truncate)
}
{code}
Same comment here. Since this is only for truncate, then put truncate in the
name somewhere.
{code}
--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/InterDatanodeProtocol.proto
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/InterDatanodeProtocol.proto
@@ -59,6 +59,7 @@ message UpdateReplicaUnderRecoveryRequestProto {
required ExtendedBlockProto block = 1; // Block identifier
required uint64 recoveryId = 2; // New genstamp of the replica
required uint64 newLength = 3; // New length of the replica
+ optional uint64 newBlockId = 4; // New blockId for copy (truncate)
}
{code}
There isn't any default given here for {{newBlockId}}, so what's going to
happen in a mixed-version scenario? Where you have older datanodes
communicating with newer ones. Like in this code:
{code}
storageID = impl.updateReplicaUnderRecovery(
PBHelper.convert(request.getBlock()), request.getRecoveryId(),
request.getNewBlockId(), request.getNewLength());
{code}
It will either be a null pointer exception, or returning 0 unexpectedly, right?
I would say just pass in a {{Long}} rather than a {{long}}, and check for
null. If it's {{null}}, then don't do anything to the id.
This would also mean that you could just pass null (i.e. don't pass anything
for the optional value) rather than re-setting the block's ID to itself, when
doing a non-truncate recovery.
{code}
boolean copyOnTruncate = rur.getBlockId() != newBlockId;
{code}
Can we have an accessor for this in {{ReplicaUnderRecovery}}? I see this
pattern in a lot of places.
{code}
* 2.) INode length is truncated now – clients can only read up to new length.
{code}
Should be "*new* clients can only read up to the truncated length."
Old clients may be able to read above the truncated length for a while.
{code}
@Override
public void writeFields(DataOutputStream out) throws IOException {
FSImageSerialization.writeString(src, out);
FSImageSerialization.writeString(clientName, out);
FSImageSerialization.writeString(clientMachine, out);
FSImageSerialization.writeLong(newLength, out);
FSImageSerialization.writeLong(timestamp, out);
int size = truncateBlock != null ? 1 : 0;
Block[] blocks = new Block[size];
if (truncateBlock != null) {
blocks[0] = truncateBlock;
}
FSImageSerialization.writeCompactBlockArray(blocks, out);
}
{code}
This code writes out all of the fields in truncateBlock: {{blockId}},
{{numBytes}}, {{generationStamp}}. But we only care about {{blockId}} and
{{generationStamp}}, right? It might be better just to manually write out
{{blockId}} and {{generationStamp}}, rather than using
{{FSImageSerialization.writeCompactBlockArray}}.
{code}
if(oldLength == newLength)
return true;
if(oldLength < newLength)
throw new HadoopIllegalArgumentException(
"Cannot truncate to a larger file size. Current size: " + oldLength +
", truncate size: " + newLength + ".");
{code}
Missing braces around "if". Same in a few other places
{{INodeFile}}: can these changes be split out into a separate patch? They seem
like they could be done as a standalone patch since they just add utility
methods and so on. Same with the changes in {{AbstractINodeDiffList}} and some
of the other snapshot stuff.
> Snapshot support for truncate
> -----------------------------
>
> Key: HDFS-7056
> URL: https://issues.apache.org/jira/browse/HDFS-7056
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: namenode
> Affects Versions: 3.0.0
> Reporter: Konstantin Shvachko
> Assignee: Plamen Jeliazkov
> Attachments: HDFS-3107-HDFS-7056-combined.patch,
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch,
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch,
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch,
> HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch,
> HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch,
> HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch,
> HDFSSnapshotWithTruncateDesign.docx
>
>
> Implementation of truncate in HDFS-3107 does not allow truncating files which
> are in a snapshot. It is desirable to be able to truncate and still keep the
> old file state of the file in the snapshot.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)