Re: [Neo4j] Re: Indexing, CSV import and SDN retrieval not working together?

Liliana Ziolek Tue, 01 Jul 2014 12:07:39 -0700

Okay, that makes sense, wasn't aware of limitation #1 (and the rest
that is a result), and you're totally right - it's not worth the
effort considering it's not going to be around for long.


I'll go with the manual index update as you suggest, which I'll be
able to remove once proper, non-legacy fulltext indexing comes in
Neo4j.

Thanks a lot!

On Tue, Jul 1, 2014 at 7:48 PM, Michael Hunger
<[email protected]> wrote:
> Hi,
>
> you have a pretty good understanding and you're right at that.
>
> Unfortunately there is more to it:
>
> #1 there is only one of these automatic indexes globally called
> node_auto_index
> So any field in any class that uses this fulltext would have to use the same
> index, and if only two of them share the same property (e.g. "description"
> there would be clashes, in terms of returning the wrong types of nodes)
> #2 the indexes are also used when querying the graph, so that means for the
> automatic query generation it would have to take into account that certain
> fields are now using this auto-index and generate different types of queries
> using this index, but what happens if you have two entities with fulltext
> entries, should it use that index for both entries, then it would have 1) to
> disabiguate them and 2) construct a lucene "OR" query instead of the normal
> index query which is even more ugly in fulltext mode (as you have to quote
> lookup values and then your entries cannot contain spaces) -> all in all a
> big mess and effort
> #3 as those are legacy indexes all this effort would only have a
> "short-lived" effect and then removed again altogether.
>
> I think it would more sense to write a 5 line java function that just adds
> those entries to the index (I just realize I should have pointed that out
> from the beginning)
>
> try (Transaction tx = db.beginTx()) {
>    Index<Node> fts = db.index().forName("locations");
>    for (Node city =
> GlobalGraphOperations.at(db).findNodesWithLabel(DynamicLabel.label("City"))
> {
>        String location = city.getProperty( "name" );
>        fts.remove(n); // in case it was already there, alternatively do a
> check
>        fts.add( city, "name", location);
>    }
> }
>
>
>
> On Tue, Jul 1, 2014 at 8:29 PM, Liliana Ziolek <[email protected]>
> wrote:
>>
>> Let me check that I understand all this correctly.
>> There are two options to achieve the fulltext index at the moment:
>> - (legacy?) manual fulltext index, which is how it is configured for
>> me right now, where SDN handles the operation of adding things to the
>> index but that means that when I add the nodes from Cypher, the index
>> entries do not get created
>> - legacy automatic fulltext index, where Neo4j would automatically
>> maintain the index when data is inserted/edited, but SDN wouldn't be
>> able to touch that index
>>
>> What I don't get is why making SDN understand the second option is a
>> big effort. My naive understanding is that what SDN does behind the
>> scenes in the first case is something like:
>> - save node, setting the value of the indexed field
>> - add the index entry - something along the lines of:
>> graphDb.index().forNodes( "locations" ).add( city, "name",
>> city.getProperty( "name" ) );
>>
>> If we let Neo4j handle adding to the the index, isn't it simply a
>> matter of adding a new index type so that you can specify it in the
>> annotation and then removing the second step for writes (+ a slight
>> change on schema creation)? Why do the reads/queries need to change as
>> well - do you need to provide the details of the index when building
>> the query?
>>
>>
>> On Tue, Jul 1, 2014 at 11:04 AM, Michael Hunger
>> <[email protected]> wrote:
>> > Yep, that would work.
>> >
>> > And yes, it is planned to add automatic fulltext, spatial and other
>> > indexes
>> > to Neo4j in the future.
>> >
>> > But let's try to work out together an easy way to get the full text
>> > functionality working nonetheless. Would be a fun challenge :)
>> >
>> >
>> >
>> > On Tue, Jul 1, 2014 at 11:09 AM, Liliana Ziolek
>> > <[email protected]>
>> > wrote:
>> >>
>> >> Would it help if the index wasn't a full text index but a normal one -
>> >> would that make it work?
>> >>
>> >> More importantly, you say that cypher won't support writing to legacy
>> >> indexes - and as I understand, currently full text index is of legacy
>> >> type.
>> >> Is there a plan to introduce a new, cypher-supported full text index in
>> >> the
>> >> future neo4j? I'd be happy to go with standard index for now if there
>> >> was
>> >> hope that in the future I can just change the index type and go.
>> >>
>> >> Thanks!
>> >>
>> >> Sent from my shiny Nexus 5 phone
>> >>
>> >> On Jul 1, 2014 9:32 AM, "Michael Hunger"
>> >> <[email protected]> wrote:
>> >>>
>> >>> Hey,
>> >>>
>> >>> you are totally right, this sucks. Let me explain why.
>> >>>
>> >>> right now Cypher can't update fulltext indexes, which SDN uses. These
>> >>> are
>> >>> the legacy Neo4j indexes which require manual addition.
>> >>> You're right this is really sucky, but except for coding (i.e.
>> >>> manually
>> >>> adding the nodes, properties, values to that fulltext index) or using
>> >>> the
>> >>> neo4j-shell for that index-update, I have no good idea.
>> >>>
>> >>> It was decided consciously that Cypher will not support writing to
>> >>> legacy
>> >>> indexes.
>> >>>
>> >>> The only thing that you can do is to use a legacy auto-index
>> >>> configured
>> >>> as fulltext, but as that index is read-only, SDN can't write to it, so
>> >>> the
>> >>> field would have to be "read-only". Or we would have to add something
>> >>> to SDN
>> >>> that marks a field as using that legacy auto-index and never actually
>> >>> writing to the index itself. But then that also means this has to be
>> >>> taken
>> >>> into account with every other read operation and query generation
>> >>> which
>> >>> makes it a pretty big effort.
>> >>>
>> >>> So for now I'd rather advise to implement the CSV loading as SDN code
>> >>> using OpenCSV as reader (which is what cypher uses too).
>> >>>
>> >>> String[] header = reader.nextRow();
>> >>> for (String[] row : reader.nextRow()) {
>> >>>    City city = template.save(new
>> >>> City(get(row,header,"City"),template.save(new
>> >>> Country(get(row,header,"Country")));
>> >>>    Airport ap = template.save(new
>> >>>
>> >>> Airport(get(row,header,"Airport"),get(row,header,"IATAcode"),get(row,header,"ICAOcode"));
>> >>>    ap.serve(city);
>> >>>    template.save(ap);
>> >>> }
>> >>>
>> >>> Sorry for being not more helpful,
>> >>>
>> >>> Cheers,
>> >>>
>> >>> Michael
>> >>>
>> >>> On Tue, Jul 1, 2014 at 9:46 AM, Liliana Ziolek
>> >>> <[email protected]>
>> >>> wrote:
>> >>>>
>> >>>> Oh, in case that helps, I'm using Neo4j 2.1.2 and SDN 3.2.0-SNAPSHOT.
>> >>>>
>> >>>>
>> >>>> On Tuesday, July 1, 2014 8:46:00 AM UTC+1, Liliana Ziolek wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>> I'm trying to do - I thought - a rather simple thing - import data
>> >>>>> through CSV import and then work with it via SDN. It doesn't seem to
>> >>>>> be
>> >>>>> working though and not sure if I'm doing something silly or it
>> >>>>> doesn't work.
>> >>>>> It seems to me that the data is created fine and SDN can access it
>> >>>>> via
>> >>>>> findAll method, but it cannot find it using the indexed field.
>> >>>>>
>> >>>>> My POJO is quite simple, the important bit:
>> >>>>>
>> >>>>> public class City extends GraphNode {
>> >>>>>     @Indexed(indexType = IndexType.FULLTEXT, indexName =
>> >>>>> "locations")
>> >>>>>     private String name;
>> >>>>> (... other fields)
>> >>>>> }
>> >>>>>
>> >>>>> I have an SDN repository with the following methods:
>> >>>>> public interface CityRepository extends GraphRepository<City> {
>> >>>>>     Page<City> findByNameLike(String name, Pageable page);
>> >>>>>     List<City> findByName(String cityName);
>> >>>>> ...
>> >>>>> }
>> >>>>> On top of that a super-simple service that pretty much just wraps
>> >>>>> that
>> >>>>> into @Transactional.
>> >>>>>
>> >>>>> Everything works fine when I use SDN to put the data in and take it
>> >>>>> out, this passes fine:
>> >>>>>         locationService.addCity("Poznan", "Poland");
>> >>>>>         List<City> citiesByNameLike =
>> >>>>> locationService.getCitiesByNameLike("Pozn*");
>> >>>>>         assertThat(citiesByNameLike, hasSize(1));
>> >>>>>         assertThat(locationService.getCitiesByName("Poznan"),
>> >>>>> equalTo(citiesByNameLike));
>> >>>>>
>> >>>>> However, when I import it via CSV import and MERGE, even though I
>> >>>>> can
>> >>>>> see that the city is actually there (when I run findAll), it doesn't
>> >>>>> come
>> >>>>> back when I try to look it up by name.
>> >>>>> CSV import query:
>> >>>>> //csv fields: Airport,City,Country,IATAcode,ICAOcode
>> >>>>>         String cypherLoadCountries = "LOAD CSV WITH HEADERS FROM \""
>> >>>>> +
>> >>>>> fileLocation + "\" AS csvLine "
>> >>>>>                 + "MERGE (country:Country:_Country { name:
>> >>>>> csvLine.Country } ) "
>> >>>>>                 + "MERGE (city:City:_City { name: csvLine.City } ) "
>> >>>>>                 + "MERGE (city) - [:IS_IN] -> (country) "
>> >>>>>                 + "MERGE (airport:Airport:_Airport {name:
>> >>>>> csvLine.Airport, iataCode: csvLine.IATAcode, icaoCode:
>> >>>>> csvLine.ICAOcode} ) "
>> >>>>>                 + "MERGE (airport) - [:SERVES {__type__:
>> >>>>> 'AirportCityConnection'}] -> (city)";
>> >>>>>         neo4jTemplate.query(cypherLoadCountries, ImmutableMap.of());
>> >>>>>
>> >>>>> Am I doing something silly here? Is there meant to be a call to
>> >>>>> switch
>> >>>>> on indexing or perhaps SDN indexes a different field?
>> >>>>> Any help appreciated.
>> >>>>
>> >>>> --
>> >>>> You received this message because you are subscribed to the Google
>> >>>> Groups "Neo4j" group.
>> >>>> To unsubscribe from this group and stop receiving emails from it,
>> >>>> send
>> >>>> an email to [email protected].
>> >>>> For more options, visit https://groups.google.com/d/optout.
>> >>>
>> >>>
>> >>> --
>> >>> You received this message because you are subscribed to a topic in the
>> >>> Google Groups "Neo4j" group.
>> >>> To unsubscribe from this topic, visit
>> >>> https://groups.google.com/d/topic/neo4j/w44ApzfsRQI/unsubscribe.
>> >>> To unsubscribe from this group and all its topics, send an email to
>> >>> [email protected].
>> >>>
>> >>> For more options, visit https://groups.google.com/d/optout.
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> >> Groups
>> >> "Neo4j" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> >> an
>> >> email to [email protected].
>> >> For more options, visit https://groups.google.com/d/optout.
>> >
>> >
>> > --
>> > You received this message because you are subscribed to a topic in the
>> > Google Groups "Neo4j" group.
>> > To unsubscribe from this topic, visit
>> > https://groups.google.com/d/topic/neo4j/w44ApzfsRQI/unsubscribe.
>> > To unsubscribe from this group and all its topics, send an email to
>> > [email protected].
>> > For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
>> --
>> Liliana
>> "Write your code as if the person maintaining it is a homicidal maniac
>> who knows where you live."
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Neo4j" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/neo4j/w44ApzfsRQI/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/d/optout.



-- 
Liliana
"Write your code as if the person maintaining it is a homicidal maniac
who knows where you live."

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: Indexing, CSV import and SDN retrieval not working together?

Reply via email to