Dear Stavros,

Thank you very much for your reply. I will go through that log right away.

Best

Hailong

On Mon, Oct 22, 2012 at 4:07 AM, Volos Stavros <[email protected]>wrote:

> Dear Hailong,
>
> The frontend will ask the summary for the top documents. A backend node
> will receive a getSummary request for every top document it owns. You can
> go through the logs of the backend node and verify that the node does
> receive getSummary requests.
>
> Regards,
> -Stavros.
> ________________________________________
> From: Hailong Yang [[email protected]]
> Sent: Monday, October 22, 2012 10:38 AM
> To: Volos Stavros
> Cc: [email protected]; Lingjia Tang; Jason Mars
> Subject: Re: How to fit the index into the memory for the web search
> benchmark
>
> Dear Stavros,
>
> I am confused why we need to bring the segments into memory. I examined
> the log file from the front end server which recorded the queries sent to
> and responses received from the nutch server. The log file showed the nutch
> server only replied how many hits were found in the crawled dataset without
> being asked for the details of the page contents. So that means when
> orchestrating the searching, the object NutchBean never needs to call the
> method getSummary that accesses the segments to retrieve the page contents.
> That is also to say we don't need to care about whether the size of the
> segments could be able to fit into the memory for this specific web search
> workload in CloudSuite, right? Please Correct me if I am wrong.
>
> Best
>
> Hailong
>
>
> On Sun, Oct 21, 2012 at 9:04 AM, Volos Stavros <[email protected]
> <mailto:[email protected]>> wrote:
> Dear Hailong,
>
> The reason you get I/O activity is due to the fact that the segments don't
> fit into the memory.
>
> I would recommend reducing the size of your index so that indexes+segments
> occupy roughly 16GB.
>
> This is relatively easy to do in case you used multiple reducer tasks
> (during the crawling phase) to create
> multiple partitions.
>
> (see Notes at http://parsa.epfl.ch/cloudsuite/search.html: The
> mapred.reduce.tasks property
> determines how many index and segment partitions will be created.)
>
> Regards,
> -Stavros.
> ________________________________________
> From: Hailong Yang [[email protected]<mailto:[email protected]>]
> Sent: Friday, October 19, 2012 8:03 PM
> To: Volos Stavros
> Cc: [email protected]<mailto:[email protected]>; Lingjia
> Tang; Jason Mars
> Subject: Re: How to fit the index into the memory for the web search
> benchmark
>
> Dear Stavros,
>
> Thank you for your reply. I understand the data structures required during
> the search. The 6GB is only the size of the actual index ( the directory of
> indexes). The whole data including the segments accounts for 30GB.
>
> Best
>
> Hailong
>
> On Fri, Oct 19, 2012 at 9:03 AM, Volos Stavros <[email protected]
> <mailto:[email protected]><mailto:[email protected]<mailto:
> [email protected]>>> wrote:
> Dear Hailong,
>
> There are two components that are used when performing a query against the
> index serving node:
> (a) the actual index (under indexes)
> (b) segments (under segments)
>
> What exactly is 6GB? Are you including the segments as well?
>
> Regards,
> -Stavros.
>
>
> ________________________________________
> From: Hailong Yang [[email protected]<mailto:[email protected]
> ><mailto:[email protected]<mailto:[email protected]>>]
> Sent: Wednesday, October 17, 2012 4:51 AM
> To: [email protected]<mailto:[email protected]><mailto:
> [email protected]<mailto:[email protected]>>
> Cc: Lingjia Tang; Jason Mars
> Subject: How to fit the index into the memory for the web search benchmark
>
> Hi CloudSuite,
>
> I am experimenting with the web search benchmark. However, I am wondering
> how to fit the index into the memory in order to avoid unnecessary disk
> access. I have a 6GB index crawled from wikipedia and the RAM is 16GB.
> During the workload execution, I noticed there were periodical 2% I/O
> utilization increase and the memory used by nutch server was always less
> than 500MB. So I guess the whole index is not brought into the memory by
> default before serving the search queries, right? Could you tell me how to
> do that exactly as you did in the clearing cloud paper. Thanks!
>
>
> Best
>
> Hailong
>
>
>

Reply via email to