Re: Regarding Image Captioning in Tika for Image MIME Types

Thamme Gowda Fri, 24 Mar 2017 11:36:52 -0700

Hi Kranthi Kiran,

Please find my replies below:


Let me know if you have more questions.

Thanks,
TG
*--*
*Thamme Gowda*
TG | @thammegowda <https://twitter.com/thammegowda>
~Sent via somebody's Webmail server!

On Tue, Mar 21, 2017 at 12:21 PM, Kranthi Kiran G V <
[email protected]> wrote:

> Hello Thamme Gowda,
>
> Thank you for letting me know of the developer mailing list. I have
> created an issue [1] and I would be working on it.
> The change is not straightforward since Inception V3 pre-trained model has
> a graph while the Inception V3 pre-trained model is packaged in the form of
> a check-point (ckpt) [2].
>

Okay, I see Inception-V3 has a graph, V4 has a checkpoint.
I assume there should be a way to restore model from checkpoint? Please
refer
https://www.tensorflow.org/programmers_guide/variables#checkpoint_files


>
> What do you think of using Keras to implement the Inception V4 model? It
> would make the job of scaling it on CPU clusters easier if we can use
> deeplearning4j's model import.
>
> Should I proceed in that direction?
>
> Regarding GSoC, what kind of computation resources are we given access to?
> We would have to train the show and tell network. It takes a lot of
> computation resources.
>
> If GPUs are not used, we would have to use a CPU cluster. So, the code has
> to be re-written (from the Google implementation of Inception V4).
>
>
Training IncpetionV4 from scratch requires too much effort, time, and
resources.  We are not aiming for such things, atleast not as part of Tika
and GSoC. The suggestion i mentioned earlier was to upgrade IncpetionV3
model with Inception V4 pretrained model/checkpoint since that will be more
benificial to Tika users community :-)



>
> [1] https://issues.apache.org/jira/browse/TIKA-2306
> [2] https://github.com/tensorflow/models/tree/master/
> slim#pre-trained-models
>
>
>
>
>
> On Mon, Mar 20, 2017 at 3:17 AM, Thamme Gowda <[email protected]>
> wrote:
>
>> Hi Kranthi Kiran,
>>
>> Welcome to Tika Community. we are glad you are interested in working on
>> the issue.
>> Please remember to CC dev@tika mailing list for future discussions
>> related to tika.
>>
>>  *Should the model be trainable by the user?*
>> The basic minimum requirement is to provide a pre-trained model and make
>> the parser work out of the box without Training (expect no GPUs; expect
>> a JVM and nothing else).
>> Of course, the parser configuration should have options to change the
>> models by changing the path.
>>
>> As part of this GSoC project, integration isn't enough work. If you go
>> through the links provided in the Jira page you will notice that there
>> models for image recognition but no ready-made models for captioning. We
>> will have to train the im2text network from the dataset and make it
>> available. Thus we will have to open source the training utilities,
>> documentation or any supplementary tools we build along the way. We will
>> have to document all these in Tika wiki for the advanced users!
>>
>> This is a GSoC issue and thus we expect to work on it during the summer.
>>
>> For now, if you want a small task to familiarise yourself with Tika, I
>> have a suggestion:
>> Currently, Tika uses InceptionV3 model from Google for image recognition.
>> The InceptionV4 model is out recently which proved to be more accurate
>> than V3.
>>
>> How about upgrading tika to use newer Inception model?
>>
>> Let me know if you have more questions.
>>
>> Cheers,
>> TG
>>
>> *--*
>> *Thamme Gowda*
>> TG | @thammegowda <https://twitter.com/thammegowda>
>> ~Sent via somebody's Webmail server!
>>
>> On Sun, Mar 19, 2017 at 11:56 AM, Kranthi Kiran G V <
>> [email protected]> wrote:
>>
>>> Hello,
>>> I'm Kranthi, a 3rd computer science undergrad at NIT, Warangal and a
>>> member of Deep Learning research group at out college. I'm interested to
>>> take up the issue. I believe it would be a great contribution to the Apache
>>> Tika community.
>>>
>>> This is what I have done until now:
>>>
>>> 1) Build Tika from source using maven and explore it.
>>> 2) Tried the object recognition module from the command line. (I should
>>> probably start using the docker version to speed up my progress.)
>>>
>>> I am yet to import a keras model in dl4j. I have some doubts regarding
>>> the requirements since I'm new to this community. *Should the model be
>>> trainable by the user?* This is important because the Inception v3
>>> model without re-training has performed poorly for me (I'm currently
>>> training it with less number of steps due to limited computational
>>> resources I have -- GTX 1070).
>>>
>>> TODO (Before submitting the proposal):
>>>
>>> 1) Create a test REST API for Tika
>>> 2) Import a few models in dl4j.
>>> 3) Train im2txt on my computer.
>>>
>>> Thank you,
>>> Kranthi Kiran
>>>
>>
>>
>

Re: Regarding Image Captioning in Tika for Image MIME Types

Reply via email to