Re: [Neo4j] CPU spikes really high as the data size increases to 50 MB

Michael Hunger Sun, 10 May 2015 18:03:06 -0700

You should really update to a newer version of Neo4j.

With 2.2 you get also visual query plan profiling, that should help you a lot. 
most of your queries create way too much intermediate data.


Perhaps also get some hands on consulting / help for writing your queries.

Michael

Some tips inline


> Am 10.05.2015 um 23:58 schrieb Arun Kumar <[email protected]>:
> 
> Michael,
> 
> Thanks for looking in to this.. 
> 
> We use Neo4j as recommendation engine... We have movies, classifieds services 
> listed in our site.. We recommend movies or classifieds to our customers 
> based on their browsing behaviors... Below are some of the CQL's, we use..
> 
> 1. Movie recommendation CQL..
> 
>  start 
> n=node:node_auto_index('customerId:*'),n1=node:node_auto_index(customerId = 
> {customerId}) 
>  MATCH n-[:LIKES]->movie where movie.Language = {language} and 
> movie.MOVIE_STATUS={status} 
>  with DISTINCT movie as mov,n1 where not(n1-[:LIKES]->mov)  
>  with mov as movDet, count(mov) as movCt return 
> movCt,movDet.Movie,movDet.Language order by movCt desc
> 
-> this query creates a huge cross product
-> you create a row for each customer and every movie that they ever liked 
which might be millions
I would rather lookup the movie by language and status and follow the likes 
relationship backwards
-> the WHERE NOT is an expensive operation for each pair
-> you create too many paths in between you should use DISTINCT or an 
aggregation (even in your query)
-> you should add a limit to the result

-> how many movies do you have in the database?

> start movie=node:node_auto_index('language:EN 
> status:ACTIVE'),n1=node:node_auto_index(customerId = {customerId}) 
>  MATCH n-[:LIKES]->movie
>  with distinct movie as mov,n1
>  where not(n1-[:LIKES]->mov)  
>  with mov as movDet, count(mov) as movCt
> return movCt,movDet.Movie,movDet.Language
> order by movCt desc


> >>> Below query is used to identify the members language..

-> this query is not correct as the count would always be 1 you don't aggregate 
by the same data you group by

>  
> start n=node:node_auto_index(customerId = {customerId}) 
> MATCH n-[:LIKES]->movie return movie.Language,count(movie.Language) order by 
> count(movie.Language) desc 
> 
> 2. Other recommendation CQL..

This is another cross-product query.
What is "a" a movie or actor?

you should make sure that the lookup of a is done via a schema index
make sure that the minimum a are selected.

> 
> start n=node:node_auto_index(customerId = 
> 'a899573d-3555-4c9d-ac1b-3f070a7decd7')
> MATCH a WHERE NOT (n)-[:LIKES]->(a) and  
> a.categoryId='26'  and a.State='New Jersey' and a.PostType='Offering'  and 
> a.POST_STATUS='A' return a.id as postId order by a.Zipcode  limit 5
> 
> 
make sure to do the same as above.

Going over all customers doesn't make sense, just remove the auto-index lookup 
and make sure you use a label like a:Post
and a schema index for the categoryId

your result also doesn't make sense as you again aggregate by the same thing 
you return

> 3. Trending CQL..
> 
> start n=node:node_auto_index('customerId :*') 
> MATCH a WHERE (n)-[:LIKES]->(a) and  a.categoryId={categoryId} return a.id as 
> postId, count(a) order by count(a) DESC limit 2
> 

> 4. Identifying last viewed posts..

you forgot the label on your customer/user so the index will not be used, same 
for "a"

> 
> match (n {customerId : 'a899573d-3555-4c9d-ac1b-3f070a7decd7'})-[:LIKES]->(a) 
>  where a.categoryId='26' and a.POST_STATUS='A' 
> return a.PostType as PostType,a.Stay as Stay,a.Salary as Salary,a.Age as 
> Age,a.Language as Language,a.Experience as Experience,
> a.State as state order by a.POST_VIEWED_TIME desc limit 1
> 
> All these queries will be fired for each member when they move across each 
> page.. At any given point of time we would have 20 members on an average in 
> the site and get monthly 400K page views.. Not much though...

Make sure that your queries first are in the 10-100ms range and don't generate 
too many database hits.

> 
> I tried increasing the memory as well.. Didn't help. Let me know if my CQL's 
> are messed up..
> 
> Thanks,
> Arun.
> 
> On Sunday, May 10, 2015 at 12:59:17 PM UTC-4, Michael Hunger wrote:
> What are you doing? Can you share the type of workload / queries / code that 
> you run?
> 
> Which version are you using?
> 
> According toy our messages.log it spends all time trying to free memory 
> (causing the spike).
> 
>> wrapper.java.maxmemory=800
> -> you forgot to add a suffix here, so you do 800 bytes of heap not 800mb
> change to
> 
>> wrapper.java.maxmemory=800M
> 
> 800M heap are ok for smallish use-cases.
> 
> And you should 
> 1. upgrade to 2.2.x
> 2. alternatively use more memory for memory mapping
> 
>> # Default values for the low-level graph engine
>> neostore.nodestore.db.mapped_memory=100M
>> neostore.relationshipstore.db.mapped_memory=500M
>> neostore.relationshipgroupstore.db.mapped_memory=50M
>> neostore.propertystore.db.mapped_memory=500M
>> neostore.propertystore.db.strings.mapped_memory=250M
>> neostore.propertystore.db.arrays.mapped_memory=30M
> 
> 
> 
>> Am 10.05.2015 um 16:24 schrieb Arun Kumar <ar...@ <>pragathi.com 
>> <http://pragathi.com/>>:
>> 
>> Hi,
>> 
>> Neo4j server CPU spikes up to 90% (and higher) as the node size increases to 
>> 50 MB.. Initially the CPU is well under 15% and suddenly spikes to 90% once 
>> certain size limit is reached. I have turned OFF the logs as well. 
>> 
>> Below is the neo4j size configuration..
>> 
>> # Default values for the low-level graph engine
>> neostore.nodestore.db.mapped_memory=40M
>> neostore.relationshipstore.db.mapped_memory=40M
>> neostore.propertystore.db.mapped_memory=150M
>> neostore.propertystore.db.strings.mapped_memory=70M
>> neostore.propertystore.db.arrays.mapped_memory=30M
>> 
>> keep_logical_logs=false
>> keep_logical_logs=3 days
>> 
>> Below is the heap size and JVM configuration..
>> # Initial Java Heap Size (in MB)
>> wrapper.java.initmemory=800
>> 
>> # Maximum Java Heap Size (in MB)
>> wrapper.java.maxmemory=800
>> 
>> wrapper.java.additional=-XX:+UseConcMarkSweepGC
>> wrapper.java.additional=-XX:+CMSClassUnloadingEnabled
>> wrapper.java.additional=-XX:NewRatio=3
>> wrapper.java.additional=-d64
>> wrapper.java.additional=-server
>> wrapper.java.additional=-Xss2048k
>> wrapper.java.additional=-XX:+UseParNewGC
>> 
>> I have attached message.log .. 
>> 
>> Would appreciate any guidance in this issue.
>> 
>> Thanks,
>> Arun.
>> 
>> 
>> 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@ <>googlegroups.com <http://googlegroups.com/>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
>> <message.log>
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] CPU spikes really high as the data size increases to 50 MB

Reply via email to