[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-07 Thread Bertrand Delacretaz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532136#comment-14532136
 ] 

Bertrand Delacretaz commented on SLING-4694:


I'm think of ContentAwareMimeTypeService as a better way to implement this, 
which is reusable in other contexts.

We might use a detectionMode service property to make several variants of 
ContentAwareMimeTypeService available (filename-only, using Tika, etc.) and 
clients (like the webdav servlet) can then select one based on that property.

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Affects Versions: JCR Webdav 2.2.2
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-07 Thread Satya Deep Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532177#comment-14532177
 ] 

Satya Deep Maheshwari commented on SLING-4694:
--

I like this idea. One problem that I see with this though is that if the 
current SlingTikaDetector uses ContentAwareMimeTypeService  and 
ContentAwareMimeTypeService  would need a TikaDetector itself to figure out the 
mime-type. So we have 2 detectors coming into picture to figure out the 
mime-type. And these need to be different ones (how?) to avoid a recursive 
situation. 

I would want to have MimeTypeService configurable as you suggest but perhaps 
also make SlingWebdavServlet configurable for the detector for this special 
case wherein MimeTypeService  is itself being used by a TikaDetector i.e 
SlingTikaDetector.

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Affects Versions: JCR Webdav 2.2.2
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-07 Thread Bertrand Delacretaz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532196#comment-14532196
 ] 

Bertrand Delacretaz commented on SLING-4694:


SlingTikaDetector uses Tika APIs but it can be just a bridge for the new 
ContentAwareMimeTypeService, with no actual detection logic.

The current SlingTikaDetector tries to extract a filename from the 
Metadata.RESOURCE_NAME_KEY String, which should also IMO be a service 
(FilenameExtractor) provided by the Sling mimetype bundle.

SlingTikaDetector also falls back to the Tika MediaType.parse(..) method if the 
Sling MimeTypeService does not find a mime type, this can also  be a feature of 
a Tika-based ContentAwareMimeTypeService instead.

To avoid any risks of Tika API incompatibilities, I would embed the required 
Tika classes in the webdav bundle, and make the ContentAwareMimeTypeService API 
independent of Tika.

Note that we must make sure the affected webdav bundle code has sufficient test 
coverage in this area before making any changes, to make sure those changes 
stay backwards compatible.

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Affects Versions: JCR Webdav 2.2.2
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-07 Thread Oliver Lietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532217#comment-14532217
 ] 

Oliver Lietz commented on SLING-4694:
-

The content/mime type detector should really be general so it could be used in 
other bundles. I've started work on a detector in content loader, but have 
currently no time to continue:

{noformat}
public interface ContentTypeDetector {

String detectContentType(InputStream contentStream, String filename);

}
{noformat}

[~tomek.rekawek], this classes should be removed as they are not used right 
now. Do you still work on content loader and clean up further?

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Affects Versions: JCR Webdav 2.2.2
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-07 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532263#comment-14532263
 ] 

Tomek Rękawek commented on SLING-4694:
--

I finished my work on the content loader (my main purpose was to provide the 
pluggable readers), but I can help cleaning the module up. Do you want me to 
remove the ContentTypeDetector?

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Affects Versions: JCR Webdav 2.2.2
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-07 Thread Satya Deep Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532132#comment-14532132
 ] 

Satya Deep Maheshwari commented on SLING-4694:
--

It would be generally useful IMO to create an extended MimeTypeService
[~bdelacretaz] are you suggesting this in addition to making TikaDetector 
configurable in WebDav servlet or an alternative way of implementing this? 

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Affects Versions: JCR Webdav 2.2.2
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-07 Thread Bertrand Delacretaz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532223#comment-14532223
 ] 

Bertrand Delacretaz commented on SLING-4694:


bq. The content/mime type detector should really be general so it could be used 
in other bundles.

Yes that's the idea of my {{ContentAwareMimeTypeService}} suggestion. Best is 
probably to create a new commons/content-detection bundle that provides those 
enhanced APIs.

I won't have time to work on this in the next few days, so just throwing ideas 
around ATM.

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Affects Versions: JCR Webdav 2.2.2
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-07 Thread Satya Deep Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532209#comment-14532209
 ] 

Satya Deep Maheshwari commented on SLING-4694:
--

SlingTikaDetector uses Tika APIs but it can be just a bridge for the new 
ContentAwareMimeTypeService, with no actual detection logic.

Makes sense. This clarifies things substantially.

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Affects Versions: JCR Webdav 2.2.2
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-07 Thread Oliver Lietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532472#comment-14532472
 ] 

Oliver Lietz commented on SLING-4694:
-

Done.

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Affects Versions: JCR Webdav 2.2.2
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-07 Thread Satya Deep Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532399#comment-14532399
 ] 

Satya Deep Maheshwari commented on SLING-4694:
--

I'll try to come up with the implementation of 
preContentAwareMimeTypeService/pre as discussed.

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Affects Versions: JCR Webdav 2.2.2
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-06 Thread Satya Deep Maheshwari (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530414#comment-14530414
 ] 

Satya Deep Maheshwari commented on SLING-4694:
--

I am working on an approach wherein the Sling webdav servlet would be passed a 
tika detector via @Reference . The current SlingTikaDetector would also expose 
itself as a service and the webdav servlet will get its reference for further 
use. Alternatively, one can pass another tika detector to the webdav servlet by 
exposing it as a service. Sling quickstart already includes a Tika Osgi bundle 
which provides the default TikaDetector which can be used as an alternative to 
the internal SlingTikaDetector .

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-4694) SlingWebDavServlet should have a configurable Tika detector

2015-05-06 Thread Bertrand Delacretaz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530420#comment-14530420
 ] 

Bertrand Delacretaz commented on SLING-4694:


It would be generally useful IMO to create an extended {{MimeTypeService}} to 
be able do to content-based mime type detection.

Maybe something like

{code}
public interface ContentAwareMimeTypeService extends MimeTypeService {
  /** @param filename used if content is null or if this service does not 
support content-based detection 
  *   @param content optional stream that points to the content to analyze
  *   (TODO explain any relevant constraints on that stream, does it need to 
support mark/reset etc)
  */
  String getMimeType(String filename, InputStream content);
}
{code}

The webdav code can then use this if available, preferring it to the basic 
MimeTypeService.

 SlingWebDavServlet should have a configurable Tika detector
 ---

 Key: SLING-4694
 URL: https://issues.apache.org/jira/browse/SLING-4694
 Project: Sling
  Issue Type: Improvement
  Components: Servlets
Reporter: Satya Deep Maheshwari

 *Problem description:* I am facing a problem with the mime type detection of 
 a file. While debugging, I see that SlingTikaDetector.detect method is used 
 for detecting the mime type of my file. See [1]. This method just seems to 
 rely on the name of the file for detecting its mime type. Even though its 
 passed an inputstream of the file, it does not seem to use it for mime type 
 detection. So if my file name is something like xyz.tmp, it detects its mime 
 type as application/octet-stream (the default) while it may actually be a png 
 file. This is a common scenario with webdav clients wherein temporary files 
 get created with such names while being edited.
 *Suggested Solution:* 
 Quoting [~rombert]
 {quote}
 Following the discussions at SLING-1059 [1] and SLING-255 [2] I can
 infer that we more or less opted out of the 'heavy-weight' approach of
 actually parsing the input stream. Not sure if we want to revisit that
 TBH. At any rate, our MimeTypeService does not have an API for getting
 the file content based on the input stream.
 I think though there's a way around it, but only at the code level.
 The org.apache.sling.jcr.webdav.impl.helper.SlingResourceConfig class
 hardcodes the Detector implementation to be a SlingTikaDetector.
 I think it is worthwile to raise a Jira issue for this and it would
 definitely expedite the fix if you're willing to submit a patch / pull
 request. I think it can be as simple as adding a @Reference to a Tika
 Detector to the SlingWebDavServlet and then passing that to the
 SlingServletConfig.
 Cheers,
 Robert
 [1]: https://issues.apache.org/jira/browse/SLING-1059
 [2]: https://issues.apache.org/jira/browse/SLING-255
 {quote}
 Related mailing-list thread on this: 
 http://apache-sling.73963.n3.nabble.com/mime-type-detection-td4050586.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)