Murtadha Hubail has posted comments on this change.

Change subject: Add support for persistent local resources to 
IndexLifecycleManager
......................................................................


Patch Set 2:

To answer your first point, here is some background behind the reason of this 
change.

a) A resource location (Name) is passed to IndexDataFlowHelper as an absolute 
path from a FileSplit.
b) The current design of PersistentLocalResourceRepository assumes that the 
mapping information between (Resource Name -> LocalResource) is already in 
memory, and all resources info is loaded in memory during startup. 
c) We also need to keep a different mapping information between (ID -> 
LocalResource) since the API of IIndexLifecycleManager (DatasetLifecycleManager 
in Asterix) is only using the resource id, so if it needs to access any other 
info regarding this resource (e.g. Dataset ID), it needs to use the (ID -> 
LocalResource) map. If we have a huge number of resources, all of that info 
will be kept in memory all the time until a resource is destroyed.

I fixed this on the Asterix side of this change and made 
PersistentLocalResourceRepository load resources on demand and there is a cache 
between (Resource Name -> LocalResource) with a certain max size and got rid of 
the two mappings. But now DatasetLifecycleManager won’t be able to access the 
resource using the ID only. That’s why I introduced this change here to pass 
the name instead of the id. Using the Name, it will ask 
PersistentLocalResourceRepository for the resource using its Name and it will 
be fetched from the cache or disk.

Another option I could’ve done is to pass the whole LocalResource instead of 
the name. This way DatasetLifecycleManager wouldn’t need to fetch it again at 
all. But then I found that we have a logic that is based on checking whether 
that resource actually exists or not.

Coming back to the rest of your points:
2) I could change the interface to use Object, but it will again break the 
existing code of other clients.

3) If we all agree that String is the better identifier, I’m going to add 
deprecated to the resource ID methods.

4) Clients do not need to make this check since they should know what type of 
ILocalResourceRepository they are using. I added this check here because 
IndexDataflowHelper is a shared class between our clients. New clients could 
create a new class which implements IIndexDataflowHelper that doesn’t need to 
make this check. Hopefully we could implement point 3 and git rid of this check 
as well. I also wanted to make this check based on instanceof instead of a 
boolean but couldn’t due to projects dependancies.

Hope this answers your points.

-- 
To view, visit https://asterix-gerrit.ics.uci.edu/343
To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I2e8e974fc2f746959639ce94351f8e419a7f9093
Gerrit-PatchSet: 2
Gerrit-Project: hyracks
Gerrit-Branch: master
Gerrit-Owner: Murtadha Hubail <[email protected]>
Gerrit-Reviewer: Ian Maxon <[email protected]>
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Murtadha Hubail <[email protected]>
Gerrit-Reviewer: Till Westmann <[email protected]>
Gerrit-Reviewer: Yingyi Bu <[email protected]>
Gerrit-Reviewer: Young-Seok Kim <[email protected]>
Gerrit-Reviewer: abdullah alamoudi <[email protected]>
Gerrit-HasComments: No

Reply via email to