imply-cheddar commented on code in PR #13394:
URL: https://github.com/apache/druid/pull/13394#discussion_r1040461494


##########
docs/dependencies/deep-storage.md:
##########
@@ -25,24 +25,47 @@ title: "Deep storage"
 
 Deep storage is where segments are stored.  It is a storage mechanism that 
Apache Druid does not provide.  This deep storage infrastructure defines the 
level of durability of your data, as long as Druid processes can see this 
storage infrastructure and get at the segments stored on it, you will not lose 
data no matter how many Druid nodes you lose.  If segments disappear from this 
storage layer, then you will lose whatever data those segments represented.
 
-## Local Mount
+## Local
 
-A local mount can be used for storage of segments as well.  This allows you to 
use just your local file system or anything else that can be mount locally like 
NFS, Ceph, etc.  This is the default deep storage implementation.
+Local storage is intended for use in the following situations:
 
-In order to use a local mount for deep storage, you need to set the following 
configuration in your common configs.
+- You have just one server.
+- Or, you have multiple servers, and they all have access to a shared 
filesystem (for example: NFS).
+
+In order to use a local directory for deep storage, set the following 
configuration in `common.runtime.properties`.
 
 |Property|Possible Values|Description|Default|
 |--------|---------------|-----------|-------|
-|`druid.storage.type`|local||Must be set.|
-|`druid.storage.storageDirectory`||Directory for storing segments.|Must be 
set.|
+|`druid.storage.type`|`local`||Must be set.|
+|`druid.storage.storageDirectory`|any local directory|Directory for storing 
segments. Must be different from `druid.segmentCache.locations` and 
`druid.segmentCache.infoDir`.|`/tmp/druid/localStorage`|
+|`druid.storage.zip`|`true`, `false`|Whether segments in 
`druid.storage.storageDirectory` are written as directories (`false`) or zip 
files (`true`).|`false`|

Review Comment:
   Personally, I'd be down for just making the segment storage stop zipping up 
the files.  It'd be good for a wide range of things, I think.  The only risk is 
that it might create extra data on deep storage for people, but I'm not sure 
that's really the top-of-mind concern for most people.



##########
server/src/main/java/org/apache/druid/segment/loading/LocalDataSegmentPusher.java:
##########
@@ -106,27 +130,61 @@ public DataSegment pushToPath(File dataSegmentFile, 
DataSegment segment, String
         throw new IOE("Failed to rename [%s] to [%s]", tmpIndexFile, 
indexFileTarget);
       }
 
-      return dataSegment;
+      return baseSegment.withLoadSpec(makeLoadSpec(new File(outDir, 
INDEX_ZIP_FILENAME).toURI()))
+                        .withSize(size);
     }
     finally {
-      FileUtils.deleteDirectory(tmpOutDir);
+      FileUtils.deleteDirectory(tmpSegmentDir);
     }
   }
 
-  @Override
-  public Map<String, Object> makeLoadSpec(URI finalIndexZipFilePath)
+  private DataSegment pushNoZip(final File inDir, final File outDir, final 
DataSegment baseSegment) throws IOException

Review Comment:
   I'm just curious about the pushing.  If the push creates a hard link to an 
ephemeral location that belonged to the task and then the task directory is 
deleted, what happens to that hard link?  Does the file continue to exist and 
only be linked from one location?



##########
docs/dependencies/deep-storage.md:
##########
@@ -25,24 +25,47 @@ title: "Deep storage"
 
 Deep storage is where segments are stored.  It is a storage mechanism that 
Apache Druid does not provide.  This deep storage infrastructure defines the 
level of durability of your data, as long as Druid processes can see this 
storage infrastructure and get at the segments stored on it, you will not lose 
data no matter how many Druid nodes you lose.  If segments disappear from this 
storage layer, then you will lose whatever data those segments represented.
 
-## Local Mount
+## Local
 
-A local mount can be used for storage of segments as well.  This allows you to 
use just your local file system or anything else that can be mount locally like 
NFS, Ceph, etc.  This is the default deep storage implementation.
+Local storage is intended for use in the following situations:
 
-In order to use a local mount for deep storage, you need to set the following 
configuration in your common configs.
+- You have just one server.
+- Or, you have multiple servers, and they all have access to a shared 
filesystem (for example: NFS).
+
+In order to use a local directory for deep storage, set the following 
configuration in `common.runtime.properties`.
 
 |Property|Possible Values|Description|Default|
 |--------|---------------|-----------|-------|
-|`druid.storage.type`|local||Must be set.|
-|`druid.storage.storageDirectory`||Directory for storing segments.|Must be 
set.|
+|`druid.storage.type`|`local`||Must be set.|
+|`druid.storage.storageDirectory`|any local directory|Directory for storing 
segments. Must be different from `druid.segmentCache.locations` and 
`druid.segmentCache.infoDir`.|`/tmp/druid/localStorage`|
+|`druid.storage.zip`|`true`, `false`|Whether segments in 
`druid.storage.storageDirectory` are written as directories (`false`) or zip 
files (`true`).|`false`|
+
+For example:
+
+```
+druid.storage.type=local
+druid.storage.storageDirectory=/tmp/druid/localStorage
+```
+
+When using local storage, it is recommended to use the same filesystem for 
`druid.storage.type` and
+`druid.segmentCache.locations` on your Historicals. When these are on the same 
filesystem, and
+`druid.storage.zip = false` (the default), Druid creates hard links between 
the two directories rather than storing
+two copies of the same data.
+
+Local storage cannot be used if you have multiple servers without a shared 
filesystem. In this case, consider using

Review Comment:
   Consider "Local storage requires a shared file system when used across 
multiple servers."



##########
docs/dependencies/deep-storage.md:
##########
@@ -25,24 +25,47 @@ title: "Deep storage"
 
 Deep storage is where segments are stored.  It is a storage mechanism that 
Apache Druid does not provide.  This deep storage infrastructure defines the 
level of durability of your data, as long as Druid processes can see this 
storage infrastructure and get at the segments stored on it, you will not lose 
data no matter how many Druid nodes you lose.  If segments disappear from this 
storage layer, then you will lose whatever data those segments represented.
 
-## Local Mount
+## Local
 
-A local mount can be used for storage of segments as well.  This allows you to 
use just your local file system or anything else that can be mount locally like 
NFS, Ceph, etc.  This is the default deep storage implementation.
+Local storage is intended for use in the following situations:
 
-In order to use a local mount for deep storage, you need to set the following 
configuration in your common configs.
+- You have just one server.
+- Or, you have multiple servers, and they all have access to a shared 
filesystem (for example: NFS).
+
+In order to use a local directory for deep storage, set the following 
configuration in `common.runtime.properties`.
 
 |Property|Possible Values|Description|Default|
 |--------|---------------|-----------|-------|
-|`druid.storage.type`|local||Must be set.|
-|`druid.storage.storageDirectory`||Directory for storing segments.|Must be 
set.|
+|`druid.storage.type`|`local`||Must be set.|
+|`druid.storage.storageDirectory`|any local directory|Directory for storing 
segments. Must be different from `druid.segmentCache.locations` and 
`druid.segmentCache.infoDir`.|`/tmp/druid/localStorage`|
+|`druid.storage.zip`|`true`, `false`|Whether segments in 
`druid.storage.storageDirectory` are written as directories (`false`) or zip 
files (`true`).|`false`|
+
+For example:
+
+```
+druid.storage.type=local
+druid.storage.storageDirectory=/tmp/druid/localStorage
+```
+
+When using local storage, it is recommended to use the same filesystem for 
`druid.storage.type` and
+`druid.segmentCache.locations` on your Historicals. When these are on the same 
filesystem, and
+`druid.storage.zip = false` (the default), Druid creates hard links between 
the two directories rather than storing
+two copies of the same data.

Review Comment:
   Personally, I wish that this would call out that manual setting of local 
storage in a production environment is an "advanced feature" that should be 
done intentionally.  Or maybe just a first sentence that pushes for deep 
storage more.  Something like
   
   ```
   Use of a cloud store is recommended for deep storage in any production 
system.  If you know what you are doing and are using local storage, consider 
setting the same filesystem ...
   ```
   
   I know the paragraph below calls out the cloud stuff as well, I don't think 
it hurts to mention it multiple times.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to