Re: Timestamps and Cardinality in Queries

Liu, Eric Wed, 01 Mar 2017 11:32:09 -0800

Hmmm, deleting the files in .m2 doesn’t stop it from searching in locationtech, 
and using the other mvn command gives me no log output.


On 3/1/17, 10:55 AM, "Aaron D. Mihalik" <[email protected]> wrote:

    transversing: gotcha.  I completely understand now.  And now I understand
    how the prospector table would help with sniping out those nodes.
    
    maven: yep, that's the right git repo.  Locationtech is required when you
    build with the 'geoindexing' profile.  Regardless, it's strange that maven
    tried to get the apache pom from locationtech.  Deleting the
    org/apache/apache directory should force maven to download the apache pom
    from maven central.
    
    --Aaron
    
    On Wed, Mar 1, 2017 at 1:47 PM Liu, Eric <[email protected]> wrote:
    
    > Oh, that’s not an issue, that’s what we would like to do when traversing
    > through the data. If a node has a high cardinality we don’t want to 
further
    > traverse through its children.
    >
    > As for installation, did I clone the right repo for Rya? The one I’m using
    > has locationtech repos for SNAPSHOT and RELEASE:
    > https://github.com/apache/incubator-rya/blob/master/pom.xml
    >
    > On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <[email protected]> wrote:
    >
    >     Repos: The locationtech repo is up [1].  The issue is that your local
    > .m2
    >     repo is in a bad state.  Maven is trying to get the apache pom from
    >     locationtech.  Locationtech does not host that pom, instead it's on
    > maven
    >     central [2].
    >
    >     Two ways to fix this issue (you should do (1) and that'll fix it...
    > (2) is
    >     just another option for reference).
    >
    >     1. Delete your apache pom directory from your local maven repo (e.g.
    > rm -rf
    >     ~/.m2/repository/org/apache/apache/)
    >
    >     2. Tell maven to ignore remote repository metadata with the -llr flag
    > (e.g.
    >     mvn clean install -llr -Pgeoindexing)
    >
    >     Let me know if you have any other issues.
    >
    >     deep/wide: okay, I don't understand this statement: "if the
    > cardinality of
    >     a node is too high (for example, a user that owns a large number of
    >     datasets), the neighbors of that node will not be found."  Is this a
    >     property of your current datstore, or is this an issue with Rya?
    >
    >     --Aaron
    >
    >     [1]
    >
    > 
https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/
    >     [2] http://repo1.maven.org/maven2/org/apache/apache/17/
    >
    >     On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <[email protected]> wrote:
    >
    >     > Hey Eric,
    >     > Regarding the repos-- sometimes the location tech repos go down,
    > your best
    >     > bet is to wait a little bit and try again.  You can also download 
the
    >     > latest artifacts off of the apache build server.
    >     > Since location tech is only used for the geo profile we may want to
    > move
    >     > where that repo is declared (or put it in the geo profile).
    >     > For your use case, you could look to use the cardinality in the
    > prospector
    >     > services for individual nodes.  Though the prospector services could
    > be run
    >     > once and then used to be representative (that wouldn't work for your
    > use
    >     > case), you could run them regularly to keep track of counts for your
    > use
    >     > case.  Are you using the count keyword or just manually counting
    > edges?
    >     > The count keyword is pretty inefficient currently.  We could add
    > that to
    >     > our list of priorities maybe.
    >     >
    >     > Sent from my iPhone
    >     >
    >     > > On Mar 1, 2017, at 3:00 AM, Liu, Eric <[email protected]>
    > wrote:
    >     > >
    >     > > Hey Aaron,
    >     > >
    >     > > I’m currently setting up Rya to test these queries with some of 
our
    >     > data. I run into an error when I run ‘mvn clean install’, I attached
    > the
    >     > logs but it seems like I can’t connect to the snapshots repo you’re
    > using.
    >     > >
    >     > > As for “deep/wide”, it would be something like starting at a
    > dataset,
    >     > then fanning out looking for relations where it is either the
    > subject or
    >     > object, such as the user who created it, the job it came from, where
    > it’s
    >     > stored, etc. It would recurse on these neighboring nodes until a
    > total
    >     > number of results is reached. However, if the cardinality of a node
    > is too
    >     > high (for example, a user that owns a large number of datasets), the
    >     > neighbors of that node will not be found. Really, the goal is to
    > find the
    >     > most distance relevant relationships possible, and this is our
    > current
    >     > naïve way of doing so.
    >     > >
    >     > > Do you want to have a short call about this? I think it’d be
    > easier to
    >     > explain/answer questions over the phone. I’m free pretty much any
    > time
    >     > 1pm-5pm PST tomorrow (3/1).
    >     > >
    >     > > Thanks,
    >     > > Eric
    >     > >
    >     > > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <[email protected]>
    > wrote:
    >     > >
    >     > >    deep vs wide: I played around with the property paths sparql
    > operator
    >     > and
    >     > >    put up an example here [1].  This is a slightly different query
    > than
    >     > the
    >     > >    one I sent out before.  It would be worth it for us to look at
    > how
    >     > this is
    >     > >    actually executed by OpenRDF.
    >     > >
    >     > >    Eric: Could you clarify by "deep vs wide"?  I think I
    > understand your
    >     > >    queries, but I don't have a good intuition about those terms
    > and how
    >     > >    cardinality might figure into a query.  It would probably be a
    > bit
    >     > more
    >     > >    helpful if you provided a model or general description that is
    >     > (somewhat)
    >     > >    representative of your data.
    >     > >
    >     > >    --Aaron
    >     > >
    >     > >    [1]
    >     > >
    >     >
    > 
https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
    >     > >
    >     > >>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <
    > [email protected]>
    >     > wrote:
    >     > >>
    >     > >> Hi Eric,
    >     > >>
    >     > >> If you want to query by the Accumulo timestamp, something like
    >     > >> timeRange(?ts, 13141201490, 13249201490) should work in Rya. I
    > did not
    >     > try
    >     > >> it lately, but timeRange() was in Rya originally. Not sure if it
    > was
    >     > >> removed in later iterations or whether it would be useful for
    > your use
    >     > >> case. First Rya paper
    >     > >> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
    >     > discusses
    >     > >> time ranges (Section 5.3 at the link above)
    >     > >>
    >     > >> Adina
    >     > >>
    >     > >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <[email protected]
    > >
    >     > wrote:
    >     > >>>
    >     > >>> Hey John,
    >     > >>> I'm pretty sure your pull request was merged-- it was pulled in
    > through
    >     > >>> another pull request.  If not, sorry-- I thought it had been
    > merged and
    >     > >>> then just not closed.  I was going to spend some time doing
    > merges
    >     > >> tomorrow
    >     > >>> so I can get it tomorrow.
    >     > >>>
    >     > >>> Sent from my iPhone
    >     > >>>
    >     > >>>> On Feb 23, 2017, at 8:13 PM, John Smith <[email protected]>
    > wrote:
    >     > >>>>
    >     > >>>> I have a pull request that fixes that problem.. it has been
    > stuck in
    >     > >>> limbo
    >     > >>>> for months..
    > https://github.com/apache/incubator-rya-site/pull/1  Can
    >     > >>>> someone merge it into master?
    >     > >>>>
    >     > >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <
    > [email protected]>
    >     > >>> wrote:
    >     > >>>>>
    >     > >>>>> Cool, thanks for the help.
    >     > >>>>> By the way, the link to the Rya Manual is outdated on the
    >     > >>> rya.apache.org
    >     > >>>>> site. Should be pointing at https://github.com/apache/
    >     > >>>>> 
incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
    >     > >> index.md
    >     > >>>>>
    >     > >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <
    > [email protected]>
    >     > >>> wrote:
    >     > >>>>>
    >     > >>>>>   deep vs wide:
    >     > >>>>>
    >     > >>>>>   A property path query is probably your best bet.  Something
    > like:
    >     > >>>>>
    >     > >>>>>   for the following data:
    >     > >>>>>
    >     > >>>>>   s:EventA p:causes s:EventB
    >     > >>>>>   s:EventB p:causes s:EventC
    >     > >>>>>   s:EventC p:causes s:EventD
    >     > >>>>>
    >     > >>>>>
    >     > >>>>>   This query would start at EventB and work it's way up and
    > down the
    >     > >>>>> chain:
    >     > >>>>>
    >     > >>>>>   SELECT * WHERE {
    >     > >>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
    >     > >>>>>   }
    >     > >>>>>
    >     > >>>>>
    >     > >>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
    >     > >>> [email protected]>
    >     > >>>>>   wrote:
    >     > >>>>>
    >     > >>>>>> Yes, that's a good place to start.  If you have external
    > timestamps
    >     > >>>>> that
    >     > >>>>>> are built into your graph using the time ontology in owl (e.g
    > you
    >     > >>>>> have
    >     > >>>>>> triples of the form (event123, time:inDateTime,
    > 2017-02-23T14:29)),
    >     > >>>>> the
    >     > >>>>>> temporal index is exactly what you want.  If you are hoping
    > to query
    >     > >>>>> based
    >     > >>>>>> on the internal timestamps that Accumulo assigns to your
    > triples,
    >     > >>>>> then
    >     > >>>>>> there are some slight tweaks that can be done to facilitate
    > this,
    >     > >>>>> but it
    >     > >>>>>> won't be nearly as efficient (this will require some sort of
    > client
    >     > >>>>> side
    >     > >>>>>> filtering).
    >     > >>>>>>
    >     > >>>>>> Caleb A. Meier, Ph.D.
    >     > >>>>>> Software Engineer II ♦ Analyst
    >     > >>>>>> Parsons Corporation
    >     > >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
    >     > >>>>>> Office:  (703)797-3066 <(703)%20797-3066> <(703)%20797-3066>
    > <(703)%20797-3066>
    >     > <(703)%20797-3066>
    >     > >>>>>> [email protected] ♦ www.parsons.com
    >     > >>>>>>
    >     > >>>>>> -----Original Message-----
    >     > >>>>>> From: Liu, Eric [mailto:[email protected]]
    >     > >>>>>> Sent: Thursday, February 23, 2017 2:27 PM
    >     > >>>>>> To: [email protected]
    >     > >>>>>> Subject: Re: Timestamps and Cardinality in Queries
    >     > >>>>>>
    >     > >>>>>> We’d like to be able to query by timestamp; specifically, we
    > want to
    >     > >>>>> be
    >     > >>>>>> able to find all statements that were made within a given 
time
    >     > >>>>> range. Is
    >     > >>>>>> this what I should be looking at?
    >     > >>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
    >     > >>>>> apache.org_confluence_download_attachments_63407907_
    >     > >>>>>
    > Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
    >     > >>>>> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
    >     > >>>>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
    >     > >>> geo_4WXTD0qo8&m=
    >     > >>>>> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
    >     > >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <[email protected]>
    >     > wrote:
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   Hey Eric,
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   Currently timestamps can't be queried in Rya.  Do you need
    > to be
    >     > >>>>> able
    >     > >>>>>> to query by timestamp, or simply discover the timestamp for a
    > given
    >     > >>>>> node?
    >     > >>>>>> Rya does have a temporal index, but that requires you to use 
a
    >     > >>>>> temporal
    >     > >>>>>> ontology to model the temporal properties of your graph 
nodes.
    >     > >>>>>>
    >     > >>>>>>   ________________________________________
    >     > >>>>>>
    >     > >>>>>>   From: Liu, Eric <[email protected]>
    >     > >>>>>>
    >     > >>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
    >     > >>>>>>
    >     > >>>>>>   To: [email protected]
    >     > >>>>>>
    >     > >>>>>>   Subject: Timestamps and Cardinality in Queries
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   Hi,
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   Continuing from our talk earlier today I was wondering if
    > you
    >     > >>>>> could
    >     > >>>>>> provide more information about how timestamps could be
    > queried in
    >     > >>>>> Rya.
    >     > >>>>>>
    >     > >>>>>>   Also, we are trying to support a type of query that would
    >     > >>>>> essentially
    >     > >>>>>> be limiting on cardinality (different from the normal SPARQL
    > limit
    >     > >>>>> because
    >     > >>>>>> it’s for node cardinality rather than total results). I saw
    > in one
    >     > of
    >     > >>>>>> Caleb’s talks that Rya’s query optimization involves checking
    >     > >>>>> cardinality
    >     > >>>>>> first. I was wondering if there would be some way to tap into
    > this
    >     > >>>>> feature
    >     > >>>>>> for usage in queries?
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   Thanks,
    >     > >>>>>>
    >     > >>>>>>   Eric Liu
    >     > >>>>>>
    >     > >>>>>>   ________________________________________________________
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>   The information contained in this e-mail is confidential
    > and/or
    >     > >>>>>> proprietary to Capital One and/or its affiliates and may only
    > be
    >     > used
    >     > >>>>>> solely in performance of work or services for Capital One. 
The
    >     > >>>>> information
    >     > >>>>>> transmitted herewith is intended only for use by the
    > individual or
    >     > >>>>> entity
    >     > >>>>>> to which it is addressed. If the reader of this message is
    > not the
    >     > >>>>> intended
    >     > >>>>>> recipient, you are hereby notified that any review,
    > retransmission,
    >     > >>>>>> dissemination, distribution, copying or other use of, or
    > taking of
    >     > >>>>> any
    >     > >>>>>> action in reliance upon this information is strictly
    > prohibited. If
    >     > >>>>> you
    >     > >>>>>> have received this communication in error, please contact the
    > sender
    >     > >>>>> and
    >     > >>>>>> delete the material from your computer.
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>> ________________________________________________________
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>>
    >     > >>>>>> The information contained in this e-mail is confidential
    > and/or
    >     > >>>>>> proprietary to Capital One and/or its affiliates and may only
    > be
    >     > used
    >     > >>>>>> solely in performance of work or services for Capital One. 
The
    >     > >>>>> information
    >     > >>>>>> transmitted herewith is intended only for use by the
    > individual or
    >     > >>>>> entity
    >     > >>>>>> to which it is addressed. If the reader of this message is
    > not the
    >     > >>>>> intended
    >     > >>>>>> recipient, you are hereby notified that any review,
    > retransmission,
    >     > >>>>>> dissemination, distribution, copying or other use of, or
    > taking of
    >     > >>>>> any
    >     > >>>>>> action in reliance upon this information is strictly
    > prohibited. If
    >     > >>>>> you
    >     > >>>>>> have received this communication in error, please contact the
    > sender
    >     > >>>>> and
    >     > >>>>>> delete the material from your computer.
    >     > >>>>>>
    >     > >>>>>
    >     > >>>>>
    >     > >>>>> ________________________________________________________
    >     > >>>>>
    >     > >>>>> The information contained in this e-mail is confidential 
and/or
    >     > >>>>> proprietary to Capital One and/or its affiliates and may only
    > be used
    >     > >>>>> solely in performance of work or services for Capital One. The
    >     > >>> information
    >     > >>>>> transmitted herewith is intended only for use by the
    > individual or
    >     > >>> entity
    >     > >>>>> to which it is addressed. If the reader of this message is not
    > the
    >     > >>> intended
    >     > >>>>> recipient, you are hereby notified that any review,
    > retransmission,
    >     > >>>>> dissemination, distribution, copying or other use of, or
    > taking of
    >     > any
    >     > >>>>> action in reliance upon this information is strictly
    > prohibited. If
    >     > >> you
    >     > >>>>> have received this communication in error, please contact the
    > sender
    >     > >> and
    >     > >>>>> delete the material from your computer.
    >     > >>>>>
    >     > >>>
    >     > >>
    >     > >>
    >     > >>
    >     > >> --
    >     > >> Dr. Adina Crainiceanu
    >     > >> Associate Professor, Computer Science Department
    >     > >> United States Naval Academy
    >     > >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822>
    > <(410)%20293-6822>
    >     > >> [email protected]
    >     > >> http://www.usna.edu/Users/cs/adina/
    >     > >>
    >     > >
    >     > >
    >     > > ________________________________________________________
    >     > >
    >     > > The information contained in this e-mail is confidential and/or
    >     > proprietary to Capital One and/or its affiliates and may only be 
used
    >     > solely in performance of work or services for Capital One. The
    > information
    >     > transmitted herewith is intended only for use by the individual or
    > entity
    >     > to which it is addressed. If the reader of this message is not the
    > intended
    >     > recipient, you are hereby notified that any review, retransmission,
    >     > dissemination, distribution, copying or other use of, or taking of
    > any
    >     > action in reliance upon this information is strictly prohibited. If
    > you
    >     > have received this communication in error, please contact the sender
    > and
    >     > delete the material from your computer.
    >     > > <log.txt>
    >     >
    >
    >
    > ________________________________________________________
    >
    > The information contained in this e-mail is confidential and/or
    > proprietary to Capital One and/or its affiliates and may only be used
    > solely in performance of work or services for Capital One. The information
    > transmitted herewith is intended only for use by the individual or entity
    > to which it is addressed. If the reader of this message is not the 
intended
    > recipient, you are hereby notified that any review, retransmission,
    > dissemination, distribution, copying or other use of, or taking of any
    > action in reliance upon this information is strictly prohibited. If you
    > have received this communication in error, please contact the sender and
    > delete the material from your computer.
    >
    

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

Re: Timestamps and Cardinality in Queries

Reply via email to