Re: Timestamps and Cardinality in Queries

Aaron D. Mihalik Wed, 01 Mar 2017 12:31:32 -0800

That's really strange.  Can you hit the maven central repo [1] from your
machine?


I guess delete the locationtech <repository> definition from your pom?


[1] http://repo1.maven.org/maven2/org/apache/apache/17/

On Wed, Mar 1, 2017 at 2:31 PM Liu, Eric <[email protected]> wrote:

> Hmmm, deleting the files in .m2 doesn’t stop it from searching in
> locationtech, and using the other mvn command gives me no log output.
>
> On 3/1/17, 10:55 AM, "Aaron D. Mihalik" <[email protected]> wrote:
>
>     transversing: gotcha.  I completely understand now.  And now I
> understand
>     how the prospector table would help with sniping out those nodes.
>
>     maven: yep, that's the right git repo.  Locationtech is required when
> you
>     build with the 'geoindexing' profile.  Regardless, it's strange that
> maven
>     tried to get the apache pom from locationtech.  Deleting the
>     org/apache/apache directory should force maven to download the apache
> pom
>     from maven central.
>
>     --Aaron
>
>     On Wed, Mar 1, 2017 at 1:47 PM Liu, Eric <[email protected]>
> wrote:
>
>     > Oh, that’s not an issue, that’s what we would like to do when
> traversing
>     > through the data. If a node has a high cardinality we don’t want to
> further
>     > traverse through its children.
>     >
>     > As for installation, did I clone the right repo for Rya? The one I’m
> using
>     > has locationtech repos for SNAPSHOT and RELEASE:
>     > https://github.com/apache/incubator-rya/blob/master/pom.xml
>     >
>     > On 3/1/17, 6:09 AM, "Aaron D. Mihalik" <[email protected]>
> wrote:
>     >
>     >     Repos: The locationtech repo is up [1].  The issue is that your
> local
>     > .m2
>     >     repo is in a bad state.  Maven is trying to get the apache pom
> from
>     >     locationtech.  Locationtech does not host that pom, instead it's
> on
>     > maven
>     >     central [2].
>     >
>     >     Two ways to fix this issue (you should do (1) and that'll fix
> it...
>     > (2) is
>     >     just another option for reference).
>     >
>     >     1. Delete your apache pom directory from your local maven repo
> (e.g.
>     > rm -rf
>     >     ~/.m2/repository/org/apache/apache/)
>     >
>     >     2. Tell maven to ignore remote repository metadata with the -llr
> flag
>     > (e.g.
>     >     mvn clean install -llr -Pgeoindexing)
>     >
>     >     Let me know if you have any other issues.
>     >
>     >     deep/wide: okay, I don't understand this statement: "if the
>     > cardinality of
>     >     a node is too high (for example, a user that owns a large number
> of
>     >     datasets), the neighbors of that node will not be found."  Is
> this a
>     >     property of your current datstore, or is this an issue with Rya?
>     >
>     >     --Aaron
>     >
>     >     [1]
>     >
>     >
> https://repo.locationtech.org/content/repositories/releases/org/locationtech/geomesa/
>     >     [2] http://repo1.maven.org/maven2/org/apache/apache/17/
>     >
>     >     On Wed, Mar 1, 2017 at 7:43 AM Puja Valiyil <[email protected]>
> wrote:
>     >
>     >     > Hey Eric,
>     >     > Regarding the repos-- sometimes the location tech repos go
> down,
>     > your best
>     >     > bet is to wait a little bit and try again.  You can also
> download the
>     >     > latest artifacts off of the apache build server.
>     >     > Since location tech is only used for the geo profile we may
> want to
>     > move
>     >     > where that repo is declared (or put it in the geo profile).
>     >     > For your use case, you could look to use the cardinality in the
>     > prospector
>     >     > services for individual nodes.  Though the prospector services
> could
>     > be run
>     >     > once and then used to be representative (that wouldn't work
> for your
>     > use
>     >     > case), you could run them regularly to keep track of counts
> for your
>     > use
>     >     > case.  Are you using the count keyword or just manually
> counting
>     > edges?
>     >     > The count keyword is pretty inefficient currently.  We could
> add
>     > that to
>     >     > our list of priorities maybe.
>     >     >
>     >     > Sent from my iPhone
>     >     >
>     >     > > On Mar 1, 2017, at 3:00 AM, Liu, Eric <
> [email protected]>
>     > wrote:
>     >     > >
>     >     > > Hey Aaron,
>     >     > >
>     >     > > I’m currently setting up Rya to test these queries with some
> of our
>     >     > data. I run into an error when I run ‘mvn clean install’, I
> attached
>     > the
>     >     > logs but it seems like I can’t connect to the snapshots repo
> you’re
>     > using.
>     >     > >
>     >     > > As for “deep/wide”, it would be something like starting at a
>     > dataset,
>     >     > then fanning out looking for relations where it is either the
>     > subject or
>     >     > object, such as the user who created it, the job it came from,
> where
>     > it’s
>     >     > stored, etc. It would recurse on these neighboring nodes until
> a
>     > total
>     >     > number of results is reached. However, if the cardinality of a
> node
>     > is too
>     >     > high (for example, a user that owns a large number of
> datasets), the
>     >     > neighbors of that node will not be found. Really, the goal is
> to
>     > find the
>     >     > most distance relevant relationships possible, and this is our
>     > current
>     >     > naïve way of doing so.
>     >     > >
>     >     > > Do you want to have a short call about this? I think it’d be
>     > easier to
>     >     > explain/answer questions over the phone. I’m free pretty much
> any
>     > time
>     >     > 1pm-5pm PST tomorrow (3/1).
>     >     > >
>     >     > > Thanks,
>     >     > > Eric
>     >     > >
>     >     > > On 2/24/17, 6:18 AM, "Aaron D. Mihalik" <
> [email protected]>
>     > wrote:
>     >     > >
>     >     > >    deep vs wide: I played around with the property paths
> sparql
>     > operator
>     >     > and
>     >     > >    put up an example here [1].  This is a slightly different
> query
>     > than
>     >     > the
>     >     > >    one I sent out before.  It would be worth it for us to
> look at
>     > how
>     >     > this is
>     >     > >    actually executed by OpenRDF.
>     >     > >
>     >     > >    Eric: Could you clarify by "deep vs wide"?  I think I
>     > understand your
>     >     > >    queries, but I don't have a good intuition about those
> terms
>     > and how
>     >     > >    cardinality might figure into a query.  It would probably
> be a
>     > bit
>     >     > more
>     >     > >    helpful if you provided a model or general description
> that is
>     >     > (somewhat)
>     >     > >    representative of your data.
>     >     > >
>     >     > >    --Aaron
>     >     > >
>     >     > >    [1]
>     >     > >
>     >     >
>     >
> https://github.com/amihalik/sesame-debugging/blob/master/src/main/java/com/github/amihalik/sesame/debugging/PropertyPathsExample.java
>     >     > >
>     >     > >>    On Thu, Feb 23, 2017 at 9:42 PM Adina Crainiceanu <
>     > [email protected]>
>     >     > wrote:
>     >     > >>
>     >     > >> Hi Eric,
>     >     > >>
>     >     > >> If you want to query by the Accumulo timestamp, something
> like
>     >     > >> timeRange(?ts, 13141201490, 13249201490) should work in
> Rya. I
>     > did not
>     >     > try
>     >     > >> it lately, but timeRange() was in Rya originally. Not sure
> if it
>     > was
>     >     > >> removed in later iterations or whether it would be useful
> for
>     > your use
>     >     > >> case. First Rya paper
>     >     > >>
> https://www.usna.edu/Users/cs/adina/research/Rya_CloudI2012.pdf
>     >     > discusses
>     >     > >> time ranges (Section 5.3 at the link above)
>     >     > >>
>     >     > >> Adina
>     >     > >>
>     >     > >>> On Thu, Feb 23, 2017 at 8:31 PM, Puja Valiyil <
> [email protected]
>     > >
>     >     > wrote:
>     >     > >>>
>     >     > >>> Hey John,
>     >     > >>> I'm pretty sure your pull request was merged-- it was
> pulled in
>     > through
>     >     > >>> another pull request.  If not, sorry-- I thought it had
> been
>     > merged and
>     >     > >>> then just not closed.  I was going to spend some time doing
>     > merges
>     >     > >> tomorrow
>     >     > >>> so I can get it tomorrow.
>     >     > >>>
>     >     > >>> Sent from my iPhone
>     >     > >>>
>     >     > >>>> On Feb 23, 2017, at 8:13 PM, John Smith <
> [email protected]>
>     > wrote:
>     >     > >>>>
>     >     > >>>> I have a pull request that fixes that problem.. it has
> been
>     > stuck in
>     >     > >>> limbo
>     >     > >>>> for months..
>     > https://github.com/apache/incubator-rya-site/pull/1  Can
>     >     > >>>> someone merge it into master?
>     >     > >>>>
>     >     > >>>>> On Thu, Feb 23, 2017 at 2:00 PM, Liu, Eric <
>     > [email protected]>
>     >     > >>> wrote:
>     >     > >>>>>
>     >     > >>>>> Cool, thanks for the help.
>     >     > >>>>> By the way, the link to the Rya Manual is outdated on the
>     >     > >>> rya.apache.org
>     >     > >>>>> site. Should be pointing at https://github.com/apache/
>     >     > >>>>>
> incubator-rya/blob/master/extras/rya.manual/src/site/markdown/_
>     >     > >> index.md
>     >     > >>>>>
>     >     > >>>>> On 2/23/17, 12:34 PM, "Aaron D. Mihalik" <
>     > [email protected]>
>     >     > >>> wrote:
>     >     > >>>>>
>     >     > >>>>>   deep vs wide:
>     >     > >>>>>
>     >     > >>>>>   A property path query is probably your best bet.
> Something
>     > like:
>     >     > >>>>>
>     >     > >>>>>   for the following data:
>     >     > >>>>>
>     >     > >>>>>   s:EventA p:causes s:EventB
>     >     > >>>>>   s:EventB p:causes s:EventC
>     >     > >>>>>   s:EventC p:causes s:EventD
>     >     > >>>>>
>     >     > >>>>>
>     >     > >>>>>   This query would start at EventB and work it's way up
> and
>     > down the
>     >     > >>>>> chain:
>     >     > >>>>>
>     >     > >>>>>   SELECT * WHERE {
>     >     > >>>>>      <s:EventB> (<p:causes>|^<p:causes>)* ?s . ?s ?p ?o
>     >     > >>>>>   }
>     >     > >>>>>
>     >     > >>>>>
>     >     > >>>>>   On Thu, Feb 23, 2017 at 2:58 PM Meier, Caleb <
>     >     > >>> [email protected]>
>     >     > >>>>>   wrote:
>     >     > >>>>>
>     >     > >>>>>> Yes, that's a good place to start.  If you have external
>     > timestamps
>     >     > >>>>> that
>     >     > >>>>>> are built into your graph using the time ontology in
> owl (e.g
>     > you
>     >     > >>>>> have
>     >     > >>>>>> triples of the form (event123, time:inDateTime,
>     > 2017-02-23T14:29)),
>     >     > >>>>> the
>     >     > >>>>>> temporal index is exactly what you want.  If you are
> hoping
>     > to query
>     >     > >>>>> based
>     >     > >>>>>> on the internal timestamps that Accumulo assigns to your
>     > triples,
>     >     > >>>>> then
>     >     > >>>>>> there are some slight tweaks that can be done to
> facilitate
>     > this,
>     >     > >>>>> but it
>     >     > >>>>>> won't be nearly as efficient (this will require some
> sort of
>     > client
>     >     > >>>>> side
>     >     > >>>>>> filtering).
>     >     > >>>>>>
>     >     > >>>>>> Caleb A. Meier, Ph.D.
>     >     > >>>>>> Software Engineer II ♦ Analyst
>     >     > >>>>>> Parsons Corporation
>     >     > >>>>>> 1911 N. Fort Myer Drive, Suite 800 ♦ Arlington, VA 22209
>     >     > >>>>>> Office:  (703)797-3066 <(703)%20797-3066>
> <(703)%20797-3066> <(703)%20797-3066>
>     > <(703)%20797-3066>
>     >     > <(703)%20797-3066>
>     >     > >>>>>> [email protected] ♦ www.parsons.com
>     >     > >>>>>>
>     >     > >>>>>> -----Original Message-----
>     >     > >>>>>> From: Liu, Eric [mailto:[email protected]]
>     >     > >>>>>> Sent: Thursday, February 23, 2017 2:27 PM
>     >     > >>>>>> To: [email protected]
>     >     > >>>>>> Subject: Re: Timestamps and Cardinality in Queries
>     >     > >>>>>>
>     >     > >>>>>> We’d like to be able to query by timestamp;
> specifically, we
>     > want to
>     >     > >>>>> be
>     >     > >>>>>> able to find all statements that were made within a
> given time
>     >     > >>>>> range. Is
>     >     > >>>>>> this what I should be looking at?
>     >     > >>>>>>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.
>     >     > >>>>> apache.org_confluence_download_attachments_63407907_
>     >     > >>>>>
>     > Rya-2520Temporal-2520Indexing.pdf-3Fversion-3D1-26modificationDate-
>     >     > >>>>>
> 3D1464789502000-26api-3Dv2&d=CwIGaQ&c=Nwf-pp4xtYRe0sCRVM8_
>     >     > >>>>> LWH54joYF7EKmrYIdfxIq10&r=vuVdzYC2kksVZR5STiFwDpzJ7CrMHC
>     >     > >>> geo_4WXTD0qo8&m=
>     >     > >>>>>
> BBheKpKX7A1Ijs8q_TDEUVtdfu-r015XHZjmcw6veAw&s=vLayAkLG0IKGE-
>     >     > >>>>> 0NbwRQKfpcfId05fXE5TX8oMJaa7Q&e=
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>> On 2/22/17, 6:21 PM, "Meier, Caleb" <
> [email protected]>
>     >     > wrote:
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   Hey Eric,
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   Currently timestamps can't be queried in Rya.  Do you
> need
>     > to be
>     >     > >>>>> able
>     >     > >>>>>> to query by timestamp, or simply discover the timestamp
> for a
>     > given
>     >     > >>>>> node?
>     >     > >>>>>> Rya does have a temporal index, but that requires you
> to use a
>     >     > >>>>> temporal
>     >     > >>>>>> ontology to model the temporal properties of your graph
> nodes.
>     >     > >>>>>>
>     >     > >>>>>>   ________________________________________
>     >     > >>>>>>
>     >     > >>>>>>   From: Liu, Eric <[email protected]>
>     >     > >>>>>>
>     >     > >>>>>>   Sent: Wednesday, February 22, 2017 6:38 PM
>     >     > >>>>>>
>     >     > >>>>>>   To: [email protected]
>     >     > >>>>>>
>     >     > >>>>>>   Subject: Timestamps and Cardinality in Queries
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   Hi,
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   Continuing from our talk earlier today I was
> wondering if
>     > you
>     >     > >>>>> could
>     >     > >>>>>> provide more information about how timestamps could be
>     > queried in
>     >     > >>>>> Rya.
>     >     > >>>>>>
>     >     > >>>>>>   Also, we are trying to support a type of query that
> would
>     >     > >>>>> essentially
>     >     > >>>>>> be limiting on cardinality (different from the normal
> SPARQL
>     > limit
>     >     > >>>>> because
>     >     > >>>>>> it’s for node cardinality rather than total results). I
> saw
>     > in one
>     >     > of
>     >     > >>>>>> Caleb’s talks that Rya’s query optimization involves
> checking
>     >     > >>>>> cardinality
>     >     > >>>>>> first. I was wondering if there would be some way to
> tap into
>     > this
>     >     > >>>>> feature
>     >     > >>>>>> for usage in queries?
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   Thanks,
>     >     > >>>>>>
>     >     > >>>>>>   Eric Liu
>     >     > >>>>>>
>     >     > >>>>>>
>  ________________________________________________________
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>   The information contained in this e-mail is
> confidential
>     > and/or
>     >     > >>>>>> proprietary to Capital One and/or its affiliates and
> may only
>     > be
>     >     > used
>     >     > >>>>>> solely in performance of work or services for Capital
> One. The
>     >     > >>>>> information
>     >     > >>>>>> transmitted herewith is intended only for use by the
>     > individual or
>     >     > >>>>> entity
>     >     > >>>>>> to which it is addressed. If the reader of this message
> is
>     > not the
>     >     > >>>>> intended
>     >     > >>>>>> recipient, you are hereby notified that any review,
>     > retransmission,
>     >     > >>>>>> dissemination, distribution, copying or other use of, or
>     > taking of
>     >     > >>>>> any
>     >     > >>>>>> action in reliance upon this information is strictly
>     > prohibited. If
>     >     > >>>>> you
>     >     > >>>>>> have received this communication in error, please
> contact the
>     > sender
>     >     > >>>>> and
>     >     > >>>>>> delete the material from your computer.
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>> ________________________________________________________
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>>
>     >     > >>>>>> The information contained in this e-mail is confidential
>     > and/or
>     >     > >>>>>> proprietary to Capital One and/or its affiliates and
> may only
>     > be
>     >     > used
>     >     > >>>>>> solely in performance of work or services for Capital
> One. The
>     >     > >>>>> information
>     >     > >>>>>> transmitted herewith is intended only for use by the
>     > individual or
>     >     > >>>>> entity
>     >     > >>>>>> to which it is addressed. If the reader of this message
> is
>     > not the
>     >     > >>>>> intended
>     >     > >>>>>> recipient, you are hereby notified that any review,
>     > retransmission,
>     >     > >>>>>> dissemination, distribution, copying or other use of, or
>     > taking of
>     >     > >>>>> any
>     >     > >>>>>> action in reliance upon this information is strictly
>     > prohibited. If
>     >     > >>>>> you
>     >     > >>>>>> have received this communication in error, please
> contact the
>     > sender
>     >     > >>>>> and
>     >     > >>>>>> delete the material from your computer.
>     >     > >>>>>>
>     >     > >>>>>
>     >     > >>>>>
>     >     > >>>>> ________________________________________________________
>     >     > >>>>>
>     >     > >>>>> The information contained in this e-mail is confidential
> and/or
>     >     > >>>>> proprietary to Capital One and/or its affiliates and may
> only
>     > be used
>     >     > >>>>> solely in performance of work or services for Capital
> One. The
>     >     > >>> information
>     >     > >>>>> transmitted herewith is intended only for use by the
>     > individual or
>     >     > >>> entity
>     >     > >>>>> to which it is addressed. If the reader of this message
> is not
>     > the
>     >     > >>> intended
>     >     > >>>>> recipient, you are hereby notified that any review,
>     > retransmission,
>     >     > >>>>> dissemination, distribution, copying or other use of, or
>     > taking of
>     >     > any
>     >     > >>>>> action in reliance upon this information is strictly
>     > prohibited. If
>     >     > >> you
>     >     > >>>>> have received this communication in error, please
> contact the
>     > sender
>     >     > >> and
>     >     > >>>>> delete the material from your computer.
>     >     > >>>>>
>     >     > >>>
>     >     > >>
>     >     > >>
>     >     > >>
>     >     > >> --
>     >     > >> Dr. Adina Crainiceanu
>     >     > >> Associate Professor, Computer Science Department
>     >     > >> United States Naval Academy
>     >     > >> 410-293-6822 <(410)%20293-6822> <(410)%20293-6822>
> <(410)%20293-6822>
>     > <(410)%20293-6822>
>     >     > >> [email protected]
>     >     > >> http://www.usna.edu/Users/cs/adina/
>     >     > >>
>     >     > >
>     >     > >
>     >     > > ________________________________________________________
>     >     > >
>     >     > > The information contained in this e-mail is confidential
> and/or
>     >     > proprietary to Capital One and/or its affiliates and may only
> be used
>     >     > solely in performance of work or services for Capital One. The
>     > information
>     >     > transmitted herewith is intended only for use by the
> individual or
>     > entity
>     >     > to which it is addressed. If the reader of this message is not
> the
>     > intended
>     >     > recipient, you are hereby notified that any review,
> retransmission,
>     >     > dissemination, distribution, copying or other use of, or
> taking of
>     > any
>     >     > action in reliance upon this information is strictly
> prohibited. If
>     > you
>     >     > have received this communication in error, please contact the
> sender
>     > and
>     >     > delete the material from your computer.
>     >     > > <log.txt>
>     >     >
>     >
>     >
>     > ________________________________________________________
>     >
>     > The information contained in this e-mail is confidential and/or
>     > proprietary to Capital One and/or its affiliates and may only be used
>     > solely in performance of work or services for Capital One. The
> information
>     > transmitted herewith is intended only for use by the individual or
> entity
>     > to which it is addressed. If the reader of this message is not the
> intended
>     > recipient, you are hereby notified that any review, retransmission,
>     > dissemination, distribution, copying or other use of, or taking of
> any
>     > action in reliance upon this information is strictly prohibited. If
> you
>     > have received this communication in error, please contact the sender
> and
>     > delete the material from your computer.
>     >
>
>
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Timestamps and Cardinality in Queries

Reply via email to