soullkk opened a new issue, #14685:
URL: https://github.com/apache/druid/issues/14685

   Historical errors when loading segments because segment is too large for 
storages
   ### Affected Version
   druid 24.0.1
   
   ### Description
   
   Please include as much detailed information about the problem as possible.
   - total 3 nodes in cluster
   - 
historical/runtime.properties:7:druid.segmentCache.locations=[{"path":"/srv/druid/var/druid9","maxSize":32862064640},{"path":"/srv/druid/var/druid8","maxSize":32862064640},{"path":"/srv/druid/var/druid7","maxSize":32862064640},{"path":"/srv/druid/var/druid6","maxSize":32862064640},{"path":"/srv/druid/var/druid5","maxSize":32862064640},{"path":"/srv/druid/var/druid4","maxSize":32862064640},{"path":"/srv/druid/var/druid3","maxSize":32862064640},{"path":"/srv/druid/var/druid2","maxSize":32862064640},{"path":"/srv/druid/var/druid1","maxSize":32862064640},{"path":"/srv/druid/var/druid10","maxSize":32862064640},{"path":"/srv/druid/var/druid12","maxSize":32862064640},{"path":"/srv/druid/var/druid11","maxSize":32862064640}]
   - there is no idea to reproduce this problem
   - 2023-07-27 01:42:21,740 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation] 
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
 too large for storage[/srv/druid/var/druid5:1,542]. Check your 
druid.segmentCache.locations maxSize param
   2023-07-27 01:42:21,740 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation] 
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
 too large for storage[/srv/druid/var/druid1:1,393]. Check your 
druid.segmentCache.locations maxSize param
   2023-07-27 01:42:21,740 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation] 
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
 too large for storage[/srv/druid/var/druid10:808]. Check your 
druid.segmentCache.locations maxSize param
   2023-07-27 01:42:21,740 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation] 
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
 too large for storage[/srv/druid/var/druid3:781]. Check your 
druid.segmentCache.locations maxSize param
   2023-07-27 01:42:21,740 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation] 
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
 too large for storage[/srv/druid/var/druid4:573]. Check your 
druid.segmentCache.locations maxSize param
   2023-07-27 01:42:21,740 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation] 
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
 too large for storage[/srv/druid/var/druid6:459]. Check your 
druid.segmentCache.locations maxSize param
   2023-07-27 01:42:21,740 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation] 
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
 too large for storage[/srv/druid/var/druid9:451]. Check your 
druid.segmentCache.locations maxSize param
   2023-07-27 01:42:21,740 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.StorageLocation] 
Segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z:6,994]
 too large for storage[/srv/druid/var/druid8:420]. Check your 
druid.segmentCache.locations maxSize param
   2023-07-27 01:42:21,741 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.SegmentLocalCacheManager]
 Asked to cleanup 
something[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z]
 that didn't exist.  Skipping.
   2023-07-27 01:42:21,741 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.server.coordination.BatchDataSegmentAnnouncer]
 No path to unannounce 
segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z]
   2023-07-27 01:42:21,741 INFO  
[ZKCoordinator--8][ROOT][org.apache.druid.server.SegmentManager] Told to delete 
a queryable on dataSource[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT] for 
interval[2023-07-24T02:00:00.000Z/2023-07-24T02:15:00.000Z] and 
version[2023-07-24T02:15:05.987Z] that I don't have.
   2023-07-27 01:42:21,741 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.segment.loading.SegmentLocalCacheManager]
 Asked to cleanup 
something[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z]
 that didn't exist.  Skipping.
   2023-07-27 01:42:21,741 WARN  
[ZKCoordinator--8][ROOT][org.apache.druid.server.coordination.SegmentLoadDropHandler]
 Unable to delete 
segmentInfoCacheFile[/srv/druid/var/druid9/info_dir/ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z]
   2023-07-27 01:42:21,741 ERROR 
[ZKCoordinator--8][ROOT][org.apache.druid.server.coordination.SegmentLoadDropHandler]
 Failed to load segment for dataSource: 
{class=org.apache.druid.server.coordination.SegmentLoadDropHandler, 
exceptionType=class org.apache.druid.segment.loading.SegmentLoadingException, 
exceptionMessage=Exception loading 
segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z],
 segment=DataSegment{binaryVersion=9, 
id=ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z,
 loadSpec={type=>hdfs, 
path=>hdfs://hacluster/srv/bigdata/druid/segments/ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT/20230724T020000.000Z_20230724T021500.000Z/2023-07-24T02_15_05.987Z/0_bff76d6a-fbec-46bf-b0a5-cc94c50ea9ec_index.zip},
 dimensions=[fabric_id, ne_dn, ne_name, ne_ip, slot_id, slot_name, 
slot_uniq_id, is_multi_slot, mac, slot_query_id, device_role]
 , metrics=[cpu_usage, cpu_effcnt, mem_usage, mem_effcnt, period_effect, 
period_ctn, deviceTime, count], shardSpec=NumberedShardSpec{partitionNum=0, 
partitions=0}, lastCompactionState=null, size=6994}}
   org.apache.druid.segment.loading.SegmentLoadingException: Exception loading 
segment[ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z]
           at 
org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:289)
 ~[druid-server-24.0.1-htrunk6.jar:?]
           at 
org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:266)
 ~[druid-server-24.0.1-htrunk6.jar:?]
           at 
org.apache.druid.server.coordination.SegmentLoadDropHandler.addSegment(SegmentLoadDropHandler.java:343)
 ~[druid-server-24.0.1-htrunk6.jar:?]
           at 
org.apache.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:61)
 ~[druid-server-24.0.1-htrunk6.jar:?]
           at 
org.apache.druid.server.coordination.ZkCoordinator.lambda$childAdded$2(ZkCoordinator.java:150)
 ~[druid-server-24.0.1-htrunk6.jar:?]
           at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[?:1.8.0_372]
           at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
~[?:1.8.0_372]
           at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
~[?:1.8.0_372]
           at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
~[?:1.8.0_372]
           at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_372]
   Caused by: org.apache.druid.segment.loading.SegmentLoadingException: Failed 
to load segment 
ODAEDATASET__DEFAULT_fi_dc_kpi_ne_raw__DEFAULT_2023-07-24T02:00:00.000Z_2023-07-24T02:15:00.000Z_2023-07-24T02:15:05.987Z
 in all locations.
           at 
org.apache.druid.segment.loading.SegmentLocalCacheManager.loadSegmentWithRetry(SegmentLocalCacheManager.java:279)
 ~[druid-server-24.0.1-htrunk6.jar:?]
           at 
org.apache.druid.segment.loading.SegmentLocalCacheManager.getSegmentFiles(SegmentLocalCacheManager.java:229)
 ~[druid-server-24.0.1-htrunk6.jar:?]
           at 
org.apache.druid.segment.loading.SegmentLocalCacheLoader.getSegment(SegmentLocalCacheLoader.java:56)
 ~[druid-server-24.0.1-htrunk6.jar:?]
           at 
org.apache.druid.server.SegmentManager.getSegmentReference(SegmentManager.java:325)
 ~[druid-server-24.0.1-htrunk6.jar:?]
           at 
org.apache.druid.server.SegmentManager.loadSegment(SegmentManager.java:268) 
~[druid-server-24.0.1-htrunk6.jar:?]
           at 
org.apache.druid.server.coordination.SegmentLoadDropHandler.loadSegment(SegmentLoadDropHandler.java:281)
 ~[druid-server-24.0.1-htrunk6.jar:?]
           ... 9 more
   - The log indicates that storageLocation.currSizeBytes is close to 
maxSizeBytes and there is no space in storage to load segments, But there is 
still a lot of space in the cache directory. The cache directory information is 
as follows:
   
![image](https://github.com/apache/druid/assets/55041925/2980b84c-a1b0-45ef-b7f9-657c8069142a)
   
![image](https://github.com/apache/druid/assets/55041925/549496bf-ceb5-4c40-8d8c-b7ed5f0f1e23)
   
   I exported the historical dump for analysis,and find there has duplicate 
directories in different storageLocation.files and i think this is not in line 
with expectations.
   SELECT location.files.map.size, location.currSizeBytes, location.files FROM 
org.apache.druid.segment.loading.StorageLocation location WHERE 
(location.currSizeBytes > 0)
   
![image](https://github.com/apache/druid/assets/55041925/f6df871e-a773-4475-a40e-9237d8a4cfcb)
   SELECT file.path.toString(), file.path.toString().substring(21) FROM 
java.io.File file WHERE ((file.path.toString().contains("/srv/druid/var/druid") 
= true) and (file.path.toString().contains("smoosh") = false))
   
![image](https://github.com/apache/druid/assets/55041925/7b792c9b-fd42-4493-bc81-fb9281a9f300)
   
![image](https://github.com/apache/druid/assets/55041925/9e76441a-5078-438b-9628-53d7b66def78)
   
![image](https://github.com/apache/druid/assets/55041925/cb76befa-6591-47ab-9483-187892c398ec)
   
   there has 66347 segments in db, and total segment size is 54GB
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to