Hello mentors, I have released a trained model of the neural image captioning system, im2txt. It can be found here: https://github.com/KranthiGV/Pretrained-Show-and-Tell-model
I am hopeful it would benefit both the researchers community and Apache Tika's community for the image captioning. Have a lot at it! Thank you, Kranthi Kiran GV, CS 3/4 Undergrad, NIT Warangal On Wed, Mar 29, 2017 at 6:50 PM, Mattmann, Chris A (3010) < [email protected]> wrote: > Sounds great, and understood. Please prepare your proposal and share with > Thamme and I for > feedback as your (potential) mentors. > > > > Thanks much. > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Chris Mattmann, Ph.D. > > Principal Data Scientist, Engineering Administrative Office (3010) > > Manager, NSF & Open Source Projects Formulation and Development Offices > (8212) > > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > > Office: 180-503E, Mailstop: 180-503 > > Email: [email protected] > > WWW: http://sunset.usc.edu/~mattmann/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > Director, Information Retrieval and Data Science Group (IRDS) > > Adjunct Associate Professor, Computer Science Department > > University of Southern California, Los Angeles, CA 90089 USA > > WWW: http://irds.usc.edu/ > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > *From: *Kranthi Kiran G V <[email protected]> > *Date: *Wednesday, March 29, 2017 at 9:17 AM > *To: *Thamme Gowda <[email protected]> > *Cc: *Chris Mattmann <[email protected]>, "[email protected]" < > [email protected]> > *Subject: *Re: Regarding Image Captioning in Tika for Image MIME Types > > > > Hello, > > 1) I have submitted a PR which can be found here > <https://github.com/apache/tika/pull/163>. > > 2) After working on the Show and Tell model since a week, I realized that > the amount of computation resources I have are enough to take up the > challenge. > > Here is a sample caption I generated after a few days of training. > > INFO:tensorflow:Loading model from checkpoint: /media/timberners/magicae/ > models/im2txt/im2txt/model/train/model.ckpt-174685 > INFO:tensorflow:Successfully loaded checkpoint: model.ckpt-174685 > Captions for image COCO_val2014_000000224477.jpg: > 0) a man riding a wave on top of a surfboard . (p=0.016002) > 1) a man riding a surfboard on a wave in the ocean . (p=0.007747) > 2) a man riding a wave on a surfboard in the ocean . (p=0.007673) > > The evaluation is on the image in the example at im2txt's page > <https://github.com/tensorflow/models/tree/master/im2txt#generating-captions>. > > > I'm excited to release the pre-trained model (if I'm allowed to) to the > public during my GSoC journey to enable everyone to use it even though they > do not have enough resources. I think it would be a great contribution to > both Apache Tika and Computer Vision community as a whole. > > 3) I am working on the schedule. I would be submitting a draft in GSoC > page. Should I send it here, too? > > Regarding my other commitments, I would be working with Amazon India > Development Centre during May 10th to July 10th. They offer flexible > working hours. > > I would be able to dedicate 40-45 hours per week. My ability to balance > both of them can be showcased by how I am working at Deep Learning Research > Group - NITW currently in the college. > > What do you think? > > > > On Mon, Mar 27, 2017 at 11:00 PM, Thamme Gowda <[email protected]> > wrote: > > Hi Kranthi Kiran, > > > > 1. Thanks for the update. I look forward to your PR. > > > > 2. I don't have complete details about compute resources from GSoC. I > think google offers free credits (Approx. 300$) when students signup to > Google Compute Engine. I am not worried about it at this time, we can sort > it out later. > > > > 3. Great to know!' > > > > Best, > > TG > > > *--* > > *Thamme Gowda* > > TG | @thammegowda <https://twitter.com/thammegowda> > > ~Sent via somebody's Webmail server! > > > > On Fri, Mar 24, 2017 at 10:42 PM, Kranthi Kiran G V < > [email protected]> wrote: > > Apologies if I was ambiguous. > > > > 1) I have already started working on the improvement. The general method > is working. I'll send a merge request after I port the REST method, too. > > > > 2) I was mentioning about the computational resources to train the final > layer of im2txt to output the captions. Google hasn't released a > pre-trained model. > > > > 3) I would update the developer community with a tentative GSoC schedule > by tonight. It would be great if the community gives me suggestions. > > > > On Mar 25, 2017 12:06 AM, "Thamme Gowda" <[email protected]> wrote: > > Hi Kranthi Kiran, > > > > Please find my replies below: > > > > Let me know if you have more questions. > > > > Thanks, > > TG > > *--* > > *Thamme Gowda* > > TG | @thammegowda <https://twitter.com/thammegowda> > > ~Sent via somebody's Webmail server! > > > > On Tue, Mar 21, 2017 at 12:21 PM, Kranthi Kiran G V < > [email protected]> wrote: > > Hello Thamme Gowda, > > Thank you for letting me know of the developer mailing list. I have > created an issue [1] and I would be working on it. > > The change is not straightforward since Inception V3 pre-trained model has > a graph while the Inception V3 pre-trained model is packaged in the form of > a check-point (ckpt) [2]. > > > > Okay, I see Inception-V3 has a graph, V4 has a checkpoint. > > I assume there should be a way to restore model from checkpoint? Please > refer https://www.tensorflow.org/programmers_guide/ > variables#checkpoint_files > > > > > > What do you think of using Keras to implement the Inception V4 model? It > would make the job of scaling it on CPU clusters easier if we can use > deeplearning4j's model import. > > > > Should I proceed in that direction? > > > > Regarding GSoC, what kind of computation resources are we given access to? > We would have to train the show and tell network. It takes a lot of > computation resources. > > > > If GPUs are not used, we would have to use a CPU cluster. So, the code has > to be re-written (from the Google implementation of Inception V4). > > > > > Training IncpetionV4 from scratch requires too much effort, time, and > resources. We are not aiming for such things, atleast not as part of Tika > and GSoC. The suggestion i mentioned earlier was to upgrade IncpetionV3 > model with Inception V4 pretrained model/checkpoint since that will be more > benificial to Tika users community :-) > > > > > > > > [1] https://issues.apache.org/jira/browse/TIKA-2306 > > [2] https://github.com/tensorflow/models/tree/master/ > slim#pre-trained-models > > > > > > > On Mon, Mar 20, 2017 at 3:17 AM, Thamme Gowda <[email protected]> > wrote: > > Hi Kranthi Kiran, > > > > Welcome to Tika Community. we are glad you are interested in working on > the issue. > > Please remember to CC dev@tika mailing list for future discussions > related to tika. > > > > *Should the model be trainable by the user?* > > The basic minimum requirement is to provide a pre-trained model and make > the parser work out of the box without Training (expect no GPUs; expect a > JVM and nothing else). > > Of course, the parser configuration should have options to change the > models by changing the path. > > > > As part of this GSoC project, integration isn't enough work. If you go > through the links provided in the Jira page you will notice that there > models for image recognition but no ready-made models for captioning. We > will have to train the im2text network from the dataset and make it > available. Thus we will have to open source the training utilities, > documentation or any supplementary tools we build along the way. We will > have to document all these in Tika wiki for the advanced users! > > > > This is a GSoC issue and thus we expect to work on it during the summer. > > > > For now, if you want a small task to familiarise yourself with Tika, I > have a suggestion: > > Currently, Tika uses InceptionV3 model from Google for image recognition. > > The InceptionV4 model is out recently which proved to be more accurate > than V3. > > > > How about upgrading tika to use newer Inception model? > > > > Let me know if you have more questions. > > > > Cheers, > > TG > > > *--* > > *Thamme Gowda* > > TG | @thammegowda <https://twitter.com/thammegowda> > > ~Sent via somebody's Webmail server! > > > > On Sun, Mar 19, 2017 at 11:56 AM, Kranthi Kiran G V < > [email protected]> wrote: > > Hello, > I'm Kranthi, a 3rd computer science undergrad at NIT, Warangal and a > member of Deep Learning research group at out college. I'm interested to > take up the issue. I believe it would be a great contribution to the Apache > Tika community. > > This is what I have done until now: > > 1) Build Tika from source using maven and explore it. > 2) Tried the object recognition module from the command line. (I should > probably start using the docker version to speed up my progress.) > > I am yet to import a keras model in dl4j. I have some doubts regarding the > requirements since I'm new to this community. *Should the model be > trainable by the user?* This is important because the Inception v3 model > without re-training has performed poorly for me (I'm currently training it > with less number of steps due to limited computational resources I have -- > GTX 1070). > > TODO (Before submitting the proposal): > > 1) Create a test REST API for Tika > > 2) Import a few models in dl4j. > > 3) Train im2txt on my computer. > > Thank you, > > Kranthi Kiran > > > > > > > > > > >
