Re: [Neo4j] CPU spikes really high as the data size increases to 50 MB

Arun Kumar Sun, 10 May 2015 18:29:37 -0700

Thanks Michael.. We will immediately start working on upgrading the Neo4j 
version..


At any given time we would have around 20 movies on an average in the 
system.. Not much..

"This is another cross-product query. What is "a" a movie or actor? you 
should make sure that the lookup of a is done via a schema index
make sure that the minimum a are selected.

start n=node:node_auto_index(customerId = 
'a899573d-3555-4c9d-ac1b-3f070a7decd7') MATCH a WHERE NOT (n)-[:LIKES]->(a) 
and  
a.categoryId='26'  and a.State='New Jersey' and a.PostType='Offering'  and 
a.POST_STATUS='A' return a.id as postId order by a.Zipcode  limit 5 "

>>> This query is not for movies.. this is recommendation of 'post' (or 
service) based on members browsing.. Once member clicks on any 'post', 
based on the 'state' (NJ, NY..), we will recommend other 'posts' from the 
same state sort by zipcode.. 

When you say 'make sure that minimum a are selected', does it mean by using 
labels? My understanding is, we are selecting minimum 'a' based on the 
'where' clause.. please let me know if my understanding is right?

Also, when i use 'node_auto_index(customerId = 
'a899573d-3555-4c9d-ac1b-3f070a7decd7')', 
am i not limiting the result set based on customer Id ? 

Thanks,
Arun.

On Sunday, May 10, 2015 at 9:02:52 PM UTC-4, Michael Hunger wrote:
>
> You should really update to a newer version of Neo4j.
>
> With 2.2 you get also visual query plan profiling, that should help you a 
> lot. most of your queries create way too much intermediate data.
>
> Perhaps also get some hands on consulting / help for writing your queries.
>
> Michael
>
> Some tips inline
>
>
> Am 10.05.2015 um 23:58 schrieb Arun Kumar <[email protected] 
> <javascript:>>:
>
> Michael,
>
> Thanks for looking in to this.. 
>
> We use Neo4j as recommendation engine... We have movies, classifieds 
> services listed in our site.. We recommend movies or classifieds to our 
> customers based on their browsing behaviors... Below are some of the CQL's, 
> we use..
>
> 1. Movie recommendation CQL..
>
>  start 
> n=node:node_auto_index('customerId:*'),n1=node:node_auto_index(customerId = 
> {customerId}) 
>  MATCH n-[:LIKES]->movie where movie.Language = {language} and 
> movie.MOVIE_STATUS={status} 
>  with *DISTINCT *movie as mov,n1 where not(n1-[:LIKES]->mov)  
>  with mov as movDet, count(mov) as movCt return 
> movCt,movDet.Movie,movDet.Language order by movCt desc
>
> -> this query creates a huge cross product
> -> you create a row for each customer and every movie that they ever liked 
> which might be millions
> I would rather lookup the movie by language and status and follow the 
> likes relationship backwards
> -> the WHERE NOT is an expensive operation for each pair
> -> you create too many paths in between you should use DISTINCT or an 
> aggregation (even in your query)
> -> you should add a limit to the result
>
> -> how many movies do you have in the database?
>
> start movie=node:node_auto_index('language:EN 
> status:ACTIVE'),n1=node:node_auto_index(customerId = {customerId}) 
>  MATCH n-[:LIKES]->movie
>  with distinct movie as mov,n1
>
>  where not(n1-[:LIKES]->mov)  
>  with mov as movDet, count(mov) as movCt
>
> return movCt,movDet.Movie,movDet.Language
>
> order by movCt desc
>
>
>
> >>> Below query is used to identify the members language..
>
>
> -> this query is not correct as the count would always be 1 you don't 
> aggregate by the same data you group by
>
>  
> start n=node:node_auto_index(customerId = {customerId}) 
> MATCH n-[:LIKES]->movie return movie.Language,count(movie.Language) order 
> by count(movie.Language) desc 
>
> 2. Other recommendation CQL..
>
>
> This is another cross-product query.
> What is "a" a movie or actor?
>
> you should make sure that the lookup of a is done via a schema index
> make sure that the minimum a are selected.
>
>
> start n=node:node_auto_index(customerId = 
> 'a899573d-3555-4c9d-ac1b-3f070a7decd7')
>
> MATCH a WHERE NOT (n)-[:LIKES]->(a) and  
> a.categoryId='26'  and a.State='New Jersey' and a.PostType='Offering'  and 
> a.POST_STATUS='A' return a.id as postId order by a.Zipcode  limit 5
>
>
> make sure to do the same as above.
>
> Going over all customers doesn't make sense, just remove the auto-index 
> lookup and make sure you use a label like a:Post
> and a schema index for the categoryId
>
> your result also doesn't make sense as you again aggregate by the same 
> thing you return
>
> 3. Trending CQL..
>
> start n=node:node_auto_index('customerId :*') 
> MATCH a WHERE (n)-[:LIKES]->(a) and  a.categoryId={categoryId} return a.id 
> as postId, count(a) order by count(a) DESC limit 2
>
>
> 4. Identifying last viewed posts..
>
>
> you forgot the label on your customer/user so the index will not be used, 
> same for "a"
>
>
> match (n {customerId : 
> 'a899573d-3555-4c9d-ac1b-3f070a7decd7'})-[:LIKES]->(a)  where 
> a.categoryId='26' and a.POST_STATUS='A' 
> return a.PostType as PostType,a.Stay as Stay,a.Salary as Salary,a.Age as 
> Age,a.Language as Language,a.Experience as Experience,
> a.State as state order by a.POST_VIEWED_TIME desc limit 1
>
> All these queries will be fired for each member when they move across each 
> page.. At any given point of time we would have 20 members on an average in 
> the site and get monthly 400K page views.. Not much though...
>
>
> Make sure that your queries first are in the 10-100ms range and don't 
> generate too many database hits.
>
>
> I tried increasing the memory as well.. Didn't help. Let me know if my 
> CQL's are messed up..
>
> Thanks,
> Arun.
>
> On Sunday, May 10, 2015 at 12:59:17 PM UTC-4, Michael Hunger wrote:
>>
>> What are you doing? Can you share the type of workload / queries / code 
>> that you run?
>>
>> Which version are you using?
>>
>> According toy our messages.log it spends all time trying to free memory 
>> (causing the spike).
>>
>> wrapper.java.maxmemory=800
>>
>> -> you forgot to add a suffix here, so you do 800 bytes of heap not 800mb
>> change to
>>
>> wrapper.java.maxmemory=800M
>>
>>
>> 800M heap are ok for smallish use-cases.
>>
>> And you should 
>> 1. upgrade to 2.2.x
>> 2. alternatively use more memory for memory mapping
>>
>> # Default values for the low-level graph engine
>> neostore.nodestore.db.mapped_memory=100M
>> neostore.relationshipstore.db.mapped_memory=500M
>>
>> neostore.relationshipgroupstore.db.mapped_memory=50M
>>
>> neostore.propertystore.db.mapped_memory=500M
>> neostore.propertystore.db.strings.mapped_memory=250M
>> neostore.propertystore.db.arrays.mapped_memory=30M
>>
>>
>>
>> Am 10.05.2015 um 16:24 schrieb Arun Kumar <[email protected]>:
>>
>> Hi,
>>
>> Neo4j server CPU spikes up to 90% (and higher) as the node size increases 
>> to 50 MB.. Initially the CPU is well under 15% and suddenly spikes to 90% 
>> once certain size limit is reached. I have turned OFF the logs as well. 
>>
>> Below is the neo4j size configuration..
>>
>> # Default values for the low-level graph engine
>> neostore.nodestore.db.mapped_memory=40M
>> neostore.relationshipstore.db.mapped_memory=40M
>> neostore.propertystore.db.mapped_memory=150M
>> neostore.propertystore.db.strings.mapped_memory=70M
>> neostore.propertystore.db.arrays.mapped_memory=30M
>>
>> keep_logical_logs=false
>> keep_logical_logs=3 days
>>
>> Below is the heap size and JVM configuration..
>> # Initial Java Heap Size (in MB)
>> wrapper.java.initmemory=800
>>
>> # Maximum Java Heap Size (in MB)
>> wrapper.java.maxmemory=800
>>
>> wrapper.java.additional=-XX:+UseConcMarkSweepGC
>> wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
>> wrapper.java.additional=-XX:NewRatio=3
>> wrapper.java.additional=-d64
>> wrapper.java.additional=-server
>> wrapper.java.additional=-Xss2048k
>> wrapper.java.additional=-XX:+UseParNewGC
>>
>> I have attached message.log .. 
>>
>> Would appreciate any guidance in this issue.
>>
>> Thanks,
>> Arun.
>>
>>
>>
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>> <message.log>
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] CPU spikes really high as the data size increases to 50 MB

Reply via email to