[jira] [Commented] (USERGRID-536) Change our index structure for static mapping and cleanup api

ASF GitHub Bot (JIRA) Mon, 13 Apr 2015 10:52:05 -0700

    [ 
https://issues.apache.org/jira/browse/USERGRID-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492734#comment-14492734
 ]


ASF GitHub Bot commented on USERGRID-536:
-----------------------------------------

GitHub user tnine opened a pull request:

    https://github.com/apache/incubator-usergrid/pull/220

    Usergrid 536

    This is the first pass of the queryindex rewrite into the new structure.  
This structure is outlined in this ticket.
    
    https://issues.apache.org/jira/browse/USERGRID-536
    
    Note that several tests in core fail assertions.  I will work on these 
directly in the dev branch to stabilize, since this work is now blocking other 
developers.  These failures are due to the default sort ordering working 
correctly, and incorrect result assertion order in the tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-usergrid USERGRID-536

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-usergrid/pull/220.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #220
    
----
commit 2699dd30fdf4bedb939a6eabfaef787a863121f6
Author: Todd Nine <[email protected]>
Date:   2015-04-01T22:05:19Z

    Removed obsolete query object.   Now a single utility builder and a 
ParsedQuery that represents our result
    
    Some tests are failing due to case sensitivity.  This is due to indexing.  
This will be fixed later.

commit f1f87b08595921d3b3a989f2ff08a090a85da82b
Author: Todd Nine <[email protected]>
Date:   2015-04-02T21:06:04Z

    Moved files to reflect their public/internal usage
    
    Updated index and search scopes with new api

commit fb81d64bf3b3aa73de303e705b477761db94f52b
Author: Todd Nine <[email protected]>
Date:   2015-04-02T22:24:57Z

    Updated mappings

commit 1e3ea744e2bffb73d4b58889306a5c9c65c6be1c
Author: Todd Nine <[email protected]>
Date:   2015-04-03T02:18:59Z

    New mapping started.
    
    Basic object -> field mapping tested

commit 0fa30b9e1c13cb5322d8066010ac053e8c1ff9f8
Author: Todd Nine <[email protected]>
Date:   2015-04-03T19:39:14Z

    Fixed Entity document -> field parsing bugs
    
    Added more tests on parsing conversion.

commit ba41bd27c80daaa612cfac6e1bc2965cdc58b626
Author: Todd Nine <[email protected]>
Date:   2015-04-03T20:44:51Z

    Indexing complete

commit 543e13bf106864e29c7b21c37caabefba33790d3
Author: Todd Nine <[email protected]>
Date:   2015-04-06T15:04:19Z

    Fixes exception throwing removal

commit cb9b06a38e77c0da931181d04784865af93c5ac6
Author: Todd Nine <[email protected]>
Date:   2015-04-06T17:40:12Z

    Upgraded to latest ES version.  Made node client the default.  Can be 
configured with elasticsearch.client.type

commit 8f70c7c29e62f692c0852a6c21062eca52ec86a5
Author: Todd Nine <[email protected]>
Date:   2015-04-06T22:23:32Z

    Search requests are no working with types and scopes.

commit 965553478d70369112edb88b63843dc9a52ade9e
Author: Todd Nine <[email protected]>
Date:   2015-04-07T23:02:43Z

    Type coercion from int->long and float->double implemented.
    
    Implemented query/filter IR tree collapsing to avoid invalid requests
    
    Moved all queries to filters for performance except wildcard and contains 
operation

commit 05c27647737e301ac5caad2280bab5b88e73f1a4
Author: Todd Nine <[email protected]>
Date:   2015-04-08T15:56:52Z

    Updated geo tests and fixed geo sorting

commit 654e19f66f184f270c32733e5cd99b22f0693c42
Author: Todd Nine <[email protected]>
Date:   2015-04-08T23:44:40Z

    Fixes paging bug with sub fields

commit f95d39c744bafecd8b141f62d69ab08a1975a55a
Author: Todd Nine <[email protected]>
Date:   2015-04-08T23:49:45Z

    Removed unused 1.0 files.

commit e534d9c7385245ad369499cc05e93a9a7ece3324
Author: Todd Nine <[email protected]>
Date:   2015-04-09T07:10:53Z

    Removed legacy 1.0 code
    Cleaned up Query object to only be query state/builder
    Refactor of tests to use query language

commit 8060e16c9f56bf702cb2bf4404d6443b0faa5382
Author: Todd Nine <[email protected]>
Date:   2015-04-09T23:26:24Z

    Fixes querying by UUID

commit 99f480dd06c4309bc232adb4975dcd4d2ac0bfc6
Author: Todd Nine <[email protected]>
Date:   2015-04-13T15:53:01Z

    Fixes setup lifecycle bug by adding test rule to query
    
    Fixes init issue by adding initialize to the setup call.
    
    Fixes incorrect assertion order

commit 2eac3be133ada49d331171bcb2386c2635fa136e
Author: Todd Nine <[email protected]>
Date:   2015-04-13T17:37:58Z

    Completed compilation fixes.

commit 641804790ee9a9465d854d9d0179274a90f064d7
Author: Todd Nine <[email protected]>
Date:   2015-04-13T17:44:56Z

    Merge branch 'two-dot-o-dev' of 
https://git-wip-us.apache.org/repos/asf/incubator-usergrid into USERGRID-536

----


> Change our index structure for static mapping and cleanup api
> -------------------------------------------------------------
>
>                 Key: USERGRID-536
>                 URL: https://issues.apache.org/jira/browse/USERGRID-536
>             Project: Usergrid
>          Issue Type: Story
>          Components: Stack
>            Reporter: Todd Nine
>            Assignee: Todd Nine
>
> Currently, our dynamic mapping causes several issues with elastic search.  We 
> should change our mapping to use a static structure, and resolve this 
> operational pain.
> We need to make the following changes.
> h2. Modify our IndexScope
> This should more closely resemble the elements of an edge since this 
> represents an edge. It will simplify the use of our query module and make 
> development clearer.  This scope should be refactored into the following 
> objects.  
> * IndexEdge - Id, name, timestamp, edgeType (source or target)
> * SearchEdge - Id, name, edgeType
> Note: edgeType is the type of the Id within the edge.  Does this Id represent 
> a source Id, or does it represent a targetId?  The entity to be indexed will 
> implicitly be the opposite of the type specified.  I.E if it's a source edge, 
> the document is the target.  If it's a target edge, the document is the 
> source.
> These values should also be stored within our document, so that we can index 
> our documents.  Note that we perform bidirectional indexing in some cases, 
> such was users, groups etc.  When we do this, we need to ensure that mark the 
> direction of the edge appropriately.
> h2. Change default sort ordering
> When sorting is unspecified, we should order by timestamp descending from our 
> index edge.  This ensures that we retain the correct edge time semantics, and 
> will properly order collections and connections
> h2. Remove the legacy query class
> We don't need the Query class, it has far too many functions to be a well 
> encapsulated object.  Instead, we should simply take the string QL, the 
> SearchEdge and the limit to return our candidates.  From there, we should 
> parse and visit the query internally to the query logic, NOT externally.
> h2. Create a static mapping
> The mapping should contains the following static fields.
> * entityId - The entity id
> * entityType - The entity type (from the id)
> * entityVersion - The entity version
> * edgeId - The edge Id
> * edgeName - The edge name
> * edgeTimestamp - The edge timestamp
> * edgeType - source | target
> * edgeSearch - edgeId + edgeName + edgeType
> It will then contain an array of "fields"  Each of these fields will have the 
> following formation.
> {code}
> { "name":"[entity field name as a path]", "[field type]":[field value}
> {code}
> We will define a field type for each type of field.  Note that each field 
> tuple will always contain a single field and a single value.  Possible field 
> types are the following.
> * string - This will be mapped into 2 mapping with multi mappings.  It will 
> be a string unanalyzed, and an analyzed string.  The 2 fields will then be 
> "string_u" and "string_a".  The Query visitor will need to update the field 
> name appropriately
> * long - An unanalyzed long
> * double - An unanalyzed double
> * boolean - An unanalyzed boolean
> * location - A geolocation field
> The entity path will be a flattened path from the root json element to the 
> max json element.  It can be though of as a path through the tree of json 
> elements.  We will use a dot '.' to delimit the fields.  X.Y.Z for nested 
> objects.  Primitive arrays will contain a field object for each element in 
> the array.
> h2. Indexing
>   When indexing entities, we will no longer modify or prefix field names.  
> They will be inserted into the value exactly as their path appears after 
> lower case.
> h2. Querying
>   When querying, the "contains" operation for a string will need to use the 
> "string_a" data type.  When using =, we will need to use the string_u data 
> type.  Each criteria will need to use nested object querying, to ensure the 
> property name and property value are both part of the same field tuple.
> h3. References
> Multi Field Mapping: 
> http://www.elastic.co/guide/en/elasticsearch/reference/current/_multi_fields.html
> Nested Objects: 
> http://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html
> Nested Object Search: 
> http://www.elastic.co/guide/en/elasticsearch/guide/master/nested-sorting.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (USERGRID-536) Change our index structure for static mapping and cleanup api

Reply via email to