As the preview message, see below, after some hours rebuild failure, we found it is due to timeout. The transfer side incoming socket read timeout( as streaming_socket_timeout_in_ms default one hours), then the whole streamsession fail.
As rebuild going the transfer rate will slow down, the transferring file can't accomplish in the timeout time. The transfer side didn't receive any byte (expected RECEIVED message), then the incoming socket raised timeout. As incoming and outgoing belong to the streamsession, To determine timeout,we can't test incoming alone, as outgoing is streaming(transferring file is continue especially large file, low speed). In other words, when file is transferring, we can't raise timeout. Question again: Will re-rebuild rebuild all rang of tokens which belong to the node or just rebuild the rest rang of tokens from last rebuild.(since last rebuild we get some data). Please excuse me for my poor English. =========================================================================== At 2015-11-21 01:07:05, "wateray" <wate...@163.com> wrote: >we want deploy one more data-center for data safe. >As we rebuild one node's data from the old DC, after some hours rebuild >failure due to network fault. >I can restart rebuild surely,but I'm afraid restart rebuild, >is it rebuild all rang of tokens which belong to the node or just rebuild the >rest rang of tokens from last rebuild.(since last rebuild we get some data). > >As I view the source, I see this code. > >class RangeStreamer method getRangeFetchMap > >private static Multimap<InetAddress, Range<Token>> >getRangeFetchMap(Multimap<Range<Token>, InetAddress> rangesWithSources, >Collection<ISourceFilter> sourceFilters, String keyspace) > { > Multimap<InetAddress, Range<Token>> rangeFetchMapMap = > HashMultimap.create(); > for (Range<Token> range : rangesWithSources.keySet()) > { > boolean foundSource = false; > > outer: > for (InetAddress address : rangesWithSources.get(range)) > { > if (address.equals(FBUtilities.getBroadcastAddress())) > { > // If localhost is a source, we have found one, but we > don't add it to the map to avoid streaming locally > foundSource = true; > continue; > } > > for (ISourceFilter filter : sourceFilters) > { > if (!filter.shouldInclude(address)) > continue outer; > } > > rangeFetchMapMap.put(address, range); > foundSource = true; > break; // ensure we only stream from one other node for each > range > } > > if (!foundSource) > throw new IllegalStateException("unable to find sufficient > sources for streaming range " + range + " in keyspace " + keyspace); > } > > return rangeFetchMapMap; > } > >The bold lines ,when found the address is localhost, It continue to find >others and then put into the rangeFetchMapMap。 >I think the continue key word should be break, if it just want rebuild the >data it doesn't have. Is it right? > > >Best regards! > > > > > >