I’d like to carry this conversation further, cross-posting to dev list:


I now have possible production use cases for accessing cache key
[metadata].



As an example, suppose I want to scan all keys from a cache that may
contain large amounts of data and perform some operation on a few of them,
based on the value of the key itself.



In this use-case the IO bandwidth required for keys & data might be as much
as a 1000 times the bandwidth required for keys alone, even when
considering request parallelization and co-location.



I imagine that Ignite can internally scan cache keys as a part of its
internal query operations. Is that correct? If so, would it be difficult to
expose this kind of functionality in the Ignite API?



Thanks,

Raymond.



*From:* Raymond Wilson [mailto:raymond_wil...@trimble.com]
*Sent:* Monday, December 4, 2017 11:26 PM
*To:* 'u...@ignite.apache.org' <u...@ignite.apache.org>
*Subject:* RE: Obtaining metadata about items in the cache



Thanks Alexey.



This would certainly reduce the IO, but does still require all the data to
be read.



My use case is not really a production one: I want to iterate all items in
the cache to determine if the page size for persistency was suitable.
Reading all the data is not too painful, but a meta data scan would be much
faster, especially if spread across the cluster in your example below.



Raymond.



*From:* Alexey Kukushkin [mailto:kukushkinale...@gmail.com
<kukushkinale...@gmail.com>]
*Sent:* Monday, December 4, 2017 11:10 PM
*To:* u...@ignite.apache.org
*Subject:* Re: Obtaining metadata about items in the cache



Hi Raymond,



I do not think Ignite supports iterating other metadata but you could
minimise IO by:

   - collocated processing (analyse entries locally without sending them
   over the network)
   - working with binary object representation directly (without
   serialisation/deserialisation)

You could send you analysis job to each partition and then execute a local
scan query that would work with binary objects. In the below code I
highlighted the affinityCall, withKeepBinary and setLocal methods you need
to use to achieve the above optimizations:



IgniteCompute compute = ignite.compute(ignite.cluster().forServers());

for (int i = 0; i < ignite.affinity("CacheName").partitions(); ++i) {



    compute.*affinityRun*(Collections.singletonList("CacheName"), i, () -> {



        IgniteCache<BinaryObject, BinaryObject> cache =
ignite.cache("CacheName").*withKeepBinary*();



        IgniteQuery<...> qry = new ScanQuery<>( (k, v) -> { ... };

        qry.*setLocal*(true);



        QueryCursor<Cache.Entry<BO, BO> cur = cache.query( );

       ...



    });



}











On Mon, Dec 4, 2017 at 1:33 AM, Raymond Wilson <raymond_wil...@trimble.com>
wrote:

Hi,



I’d like to be able to scan all the items in a cache where all I am
interested in is the cache key and other metadata about the cached item
(such as its size).



I can do this now by running a cache query that simple reads out all the
cache items, but this is a lot of IO when I don’t care about the content of
the items themselves.



Does anyone here do this?



Thanks,

Raymond.







-- 

Best regards,

Alexey

Reply via email to