Ahh! That's my bad! Sorry! I corrected it and ran it again. But I still get some taxa that are the same but have multiple nodes. The updated code is available here <https://github.com/sunitj/SuperMoM/blob/master/IMG/createDB.pl> (lines: 368-393). Here is the snippet:
Also, what's the difference between how I'm creating nodes and using the create_unique function? Aside from maybe saving me a few lines? I was trying to use the function, but wasn't able to figure out what the first and second arguments were. I couldn't find it on the blog I mentioned above and in your slides they were both "name=>$pkg" or similar. -- Sunit Jain Research Computing Specialist -- Bioinformatics Michigan Geomicrobiology Lab Dept. of Earth & Environmental Sciences, University of Michigan, Ann Arbor, MI, USA. email: [email protected] web: www.sunitjain.com meet: www.sunitjain.com/contact On Wed, Mar 18, 2015 at 10:10 PM, Mark Jensen <[email protected]> wrote: > Thanks Sunit -- > I'll think your problem is the difference highlighted in the code below. > You're looking for the species with the key 'name', but adding to the index > with key 'id'. > > You may find the $idx->create_unique() > <https://metacpan.org/pod/REST::Neo4p::Index#create_unique>method helpful > too. > MAJ > > if ($PhyloDist{$gene}{"DOMAIN"}) { > my $species=$PhyloDist{$gene}{"SPECIES"}; > ($taxa_nodes{$gene})= $idx->find_entries(name=>$species); > unless ($taxa_nodes{$gene}) { > $taxa_nodes{$gene}=REST::Neo4p::Node->new({id=>$PhyloDist{$gene}{ > "SPECIES"}}); > $taxa_nodes{$gene}->set_labels("Taxa"); > foreach (keys %{$PhyloDist{$gene}}){ > next if $_ eq "SPECIES"; > next if $_ eq "PERCENT"; > my $value=lc($PhyloDist{$gene}{$_}); > my $key=lc($_); > $taxa_nodes{$gene}->set_property({$key=>$value}); > } > $idx->add_entry($taxa_nodes{$gene}, id=>$species); > } > ... > } > > > On Wednesday, March 18, 2015 at 5:30:15 PM UTC-4, Sunit Jain wrote: >> >> First, congratulations on creating such a great perl driver for Neo4j. I >> really appreciate the work you must have put into it. >> >> I've been trying to use this driver to create a database for our >> meta*omic data. I was successfully able to put together some perl code by >> following some slides <http://www.slideshare.net/majensen1/dcpm-meetup>, >> the neo4j blog post <http://neo4j.com/blog/restneo4p-a-perl-ogm/> about >> this driver and the MetaCPAN <https://metacpan.org/pod/REST::Neo4p> >> description. However I'm getting stuck at a point where I'm no longer sure >> what's going on. I'm hoping you might be able to help. >> >> *As a side note, the example on the neo4j blog >> <http://neo4j.com/blog/restneo4p-a-perl-ogm/> seemed very limited and about >> 2yr old, is there a more recent version somewhere? Maybe one with best >> practices? If not, I'd be happy to start one explaining what I did for my >> current project, once I have at least one successful run. I**t won't be >> as insightful, but it'll be something.* >> >> *Goal:* >> Create unique Taxa nodes, have the gene locus that belong to the Taxa >> relate to it with an "IN_ORGANISM" relationship: >> >> (Taxa)<-[: IN_ORGANISM]-(Locus) >> >> >> More details can be found in createDB.pl (lines: 326-352), here >> <https://github.com/sunitj/SuperMoM/tree/master/IMG> >> >> *Issue:* >> Here is the perl snippet of my code to create unique 'Taxa' nodes: >> [image: Inline image 1] >> >> Perl snippet to create unique relations to Taxa: >> [image: Inline image 2] >> >> When I run this script, it creates the exact same taxa node 94 times! I >> did a quick grep in my CSV to find that there were 94 instances of that >> taxa. So, the script essentially created a new node each time it >> encountered a species. I also created some scaffold, locii, COG, PFam and >> Project nodes much the same way but only unique nodes were created in all >> the other instances. The only difference was that the property "id" was >> "$species" which is a text value with spaces in case of Taxa but for all >> others it was an alphanumeric without spaces, but I don't see how this >> could affect the outcome. >> >> I apologize for the lengthy email. >> >> ================ >> Linux RHEL Server 6.5 >> Perl 5.18 >> Neo4j 2.1.7 >> Java 1.7 >> ================ >> -- >> Sunit Jain >> Research Computing Specialist -- Bioinformatics >> Michigan Geomicrobiology Lab >> Dept. of Earth & Environmental Sciences, >> University of Michigan, >> Ann Arbor, MI, USA. >> web: www.sunitjain.com >> meet: www.sunitjain.com/contact >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "Neo4j" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/neo4j/QXep2b3ncMs/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
